[Solved]-How to prevent XSS attacks when I need to render HTML from a WYSIWYG editor?

8👍

You need to parse the HTML on the server and remove any tags and attributes that don’t meet a strict whitelist.
You should parse it (or at least re-render it) as strict XML to prevent attackers from exploiting differences between fuzzy parsers.

The whitelist must not include <script>, <style>, <link>, or <meta>, and must not include event handler attributes or style="".

You must also parse URLs in href="" and src="" and make sure that they are either relative paths, http://, or https://.

👤SLaks

16👍

This is late, but you can try Bleach, under the hood it uses the html5lib, and you’ll also get tag balancing.

Here is a complete snippet:

settings.py

BLEACH_VALID_TAGS = ['p', 'b', 'i', 'strike', 'ul', 'li', 'ol', 'br',
                     'span', 'blockquote', 'hr', 'a', 'img']
BLEACH_VALID_ATTRS = {
    'span': ['style', ],
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = ['color', 'cursor', 'float', 'margin']

app/forms.py

import bleach
from django.conf import settings

class MyModelForm(forms.ModelForm):
    myfield = forms.CharField(widget=MyWYSIWYGEditor)


    class Meta:
        model = MyModel

    def clean_myfield(self):
        myfield = self.cleaned_data.get('myfield', '')
        cleaned_text = bleach.clean(myfield, settings.BLEACH_VALID_TAGS, settings.BLEACH_VALID_ATTRS, settings.BLEACH_VALID_STYLES)
        return cleaned_text #sanitize html

You can read the bleach docs, so you can adapt it to your needs.

👤nitely

1👍

Adding to Nitely’s answer which was great but slightly incomplete: I also recommend using Bleach, but if you want to use it to pre-approve safe CSS styles you need to use Bleach CSS Sanitizer (separate pip install to the vanilla bleach package), which makes for a slightly different code set-up to Nitely’s.

We use the below in our Django project forms.py file (using Django-CKEditor as the content widget) to sanitize the data for our user-input ReportPages.

import bleach 
from bleach.css_sanitizer import CSSSanitizer
from django.conf import settings

css_sanitizer = CSSSanitizer(allowed_css_properties=settings.BLEACH_VALID_STYLES)

class ReportPageForm(forms.ModelForm):
    content = forms.CharField(widget=CKEditorWidget())
    class Meta:
        model = ReportPage
        fields = ('name', 'content')

    def clean_content(self):
        content = self.cleaned_data['content']
        cleaned_content = bleach.clean(
            content, 
            tags=settings.BLEACH_VALID_TAGS, 
            attributes=settings.BLEACH_VALID_ATTRS, 
            protocols=settings.BLEACH_VALID_PROTOCOLS,
            css_sanitizer=css_sanitizer,
            strip=True
        )

We include strip=True to remove mark-up that is escaped from the form content. We also include protocols so that any href attrs (for ‘a’ tags) and src attrs (for ‘img’ tags) must be https (http and mailto are enabled by default, which we wanted turned off).

For completeness’ sake, inside our settings.py file we define the following as valid mark-up for our purposes:

BLEACH_VALID_TAGS = (
    'a', 'abbr', 'acronym', 'b', 'blockquote', 'br', 'code', 
    'dd', 'div', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 
    'hr', 'i', 'img', 'li', 'ol', 'p', 'pre', 'span', 'strike', 
    'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 
    'thead', 'tr', 'tt', 'u', 'ul'
)
    
BLEACH_VALID_ATTRS = {
    '*': ['style', ], # allow all tags to have style attr
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}

BLEACH_VALID_STYLES = (
    'azimuth', 'background-color', 'border', 'border-bottom-color',
    'border-collapse', 'border-color', 'border-left-color',
    'border-right-color', 'border-top-color', 'clear',
    'color','cursor', 'direction', 'display', 'elevation', 'float',
    'font', 'font-family','font-size', 'font-style', 'font-variant',
    'font-weight', 'height', 'letter-spacing', 'line-height', 
    'margin', 'margin-bottom', 'margin-left', 'margin-right', 
    'margin-top', 'overflow', 'padding', 'padding-bottom', 
    'padding-left', 'padding-right', 'padding-top', 'pause', 
    'pause-after', 'pause-before', 'pitch', 'pitch-range',
    'richness', 'speak', 'speak-header', 'speak-numeral',
    'speak-punctuation', 'speech-rate', 'stress', 'text-align',
    'text-decoration', 'text-indent', 'unicode-bidi', 
    'vertical-align', 'voice-family', 'volume', 'white-space', 'width'
)

BLEACH_VALID_PROTOCOLS = ('https',)

0👍

@SLaks is right that you need to do the sanitization on the server since students who steal a teacher’s credentials could use those credentials to POST directly to your server.

Python HTML sanitizer / scrubber / filter discusses existing HTML sanitizers available for python.

I would suggest starting with an empty white-list, then use the WYSIWYG editor to create a snippet of HTML using each button so that you know the varieties of HTML it produces, and then whitelist only the tags/attributes needed to support the HTML it produces. Hopefully it doesn’t use the CSS style attribute because those can also be an XSS vector.

Leave a comment