Security Tip: Validating HTML & Markdown Input!
[Tip#22] Validating user input is easy to forget without adding HTML or Markdown into the mix!
đ¤ Learn to Think Like a Hacker with my hands-on practical security course: Practical Laravel Security! đľď¸
â ď¸ Want me to hack into your app and tell you how I did it, so you can fix it before someone else finds it? Book in a Laravel Security Audit and Penetration Test! đľď¸
Letâs start with this question:
Curious as to what validation strategy can be applied when saving input from a text editor (ie ckeditor, quill) that is html markup?
My initial thinking is that it would have to be a regex pattern that includes all the tags that are enabled, or something along those lines.
I love the way theyâre thinking about the output of the editor and how to protect against Cross-Site Scripting (XSS) attacks. Itâs far too easy to assume that an editor like CKEditor wonât allow the user to submit an XSS payload (you can modify it in the browser), or that because Markdown isnât HTML you canât inject HTML (you can, and itâs even allowed in the spec!). So you very much do need to be thinking like this, and planning how to defend against XSS in your user input.
However, you canât simply reach for something like a regex to solve the problem1. Youâll have a lot of trouble writing a regex to match all possible XSS payloads without also squashing legitimate tags. Just take a quick browse through this XSS cheat sheet and youâll realise the wide scope of the taskâŚ
That said, the solution doesnât have to be hard. This is such a common problem that itâs been solved many times before. đ
HTML Purifier
If youâre receiving raw HTML from the user, then you can pass it through an HTML Purifier. They will deconstruct the HTML and strip out everything you havenât specifically allowed, which allows you to be very specific with what you let your users use.
This is the one Iâve used before, and it seems to be by far the most popular one on Packagist: https://github.com/ezyang/htmlpurifier
Stripping HTML in Markdown
If youâre receiving Markdown, you should use a converter that includes the option to strip out all HTML when converting. This will save you having to pass the rendered Markdown into a purifier and double handing the data.
The one I recommend is CommonMark, which follows the CommonMark Spec.
It is important to note that part of the spec is that raw HTML is allowed, so youâll want to read their security page and ensure youâre enabling the security features:
use League\CommonMark\CommonMarkConverter;
$converter = new CommonMarkConverter([
'html_input' => 'escape',
'allow_unsafe_links' => false,
]);
echo $converter->convert('<script>alert("Hello XSS!");</script>');
// <script>alert("Hello XSS!");</script>
Laravelâs Markdown Helper
Some of you may know that Laravel includes a Markdown helper in the String class (Str::markdown()
), but you might not be aware that by default it does not strip out raw HTML.
Laravel uses CommonMark internally, so you can just pass the converter options when youâre converting your HTML:
use Illuminate\Support\Str;
>>> Str::markdown('Inject: <script>alert("Hello XSS!");</script>', [
'html_input' => 'strip',
'allow_unsafe_links' => false,
]);
// <p>Inject: alert("Hello XSS!");</p>
Therefore, if youâre dealing with a WYSIWYG, raw HTML, or Markdown, donât forget to secure it! Use a purifier or configure the server-side parser, whichever makes sense for your use case - just make sure you secure it!
Can you ever âsimplyâ use a regex?