Laravel Security In Depth

Share this post
Security Tip: Validating HTML & Markdown Input
larasec.substack.com

Security Tip: Validating HTML & Markdown Input

[Tip#22] Validating user input is easy to forget without adding HTML or Markdown into the mix!

Stephen Rees-Carter
May 20
3
2
Share this post
Security Tip: Validating HTML & Markdown Input
larasec.substack.com

Greetings my friends, I hope you’re all having an awesome week! The tip this week comes from a question a reader asked on the Validating User Input post from Nov last year. It was such a great question I wanted to share it - or more specifically the answer - with everyone.

Before we we into the security tip, I would love to welcome all of our new subscribers. It’s really awesome to have you here! 😁 I just checked the subscriber numbers and we’re almost at 900 subscribers1. I’d love to hit 900 by the end of May2, so please share Laravel Security in Depth with all of your developer friends.

Share Laravel Security In Depth

Btw, I’m fully booked for Laravel Security Audits until November, so if you want to get in before the end of the year, you’ll want to reach out soon. 🕵️


Validating HTML & Markdown Input

Let’s start with this question:

Curious as to what validation strategy can be applied when saving input from a text editor (ie ckeditor, quill) that is html markup?

My initial thinking is that it would have to be a regex pattern that includes all the tags that are enabled, or something along those lines.

I love the way they’re thinking about the output of the editor and how to protect against Cross-Site Scripting (XSS) attacks. It’s far too easy to assume that an editor like CKEditor won’t allow the user to submit an XSS payload (you can modify it in the browser), or that because Markdown isn’t HTML you can’t inject HTML (you can, and it’s even allowed in the spec!). So you very much do need to be thinking like this, and planning how to defend against XSS in your user input.

However, you can’t simply reach for something like a regex to solve the problem (can you ever “simply” use regex?). You’ll have a lot of trouble writing a regex to match all possible XSS payloads without also squashing legitimate tags. Just take a quick browse through this XSS cheat sheet and you’ll realise the wide scope of the task…

That said, the solution doesn’t have to be hard. This is such a common problem that it’s been solved many times before. 😁

HTML Purifier

If you’re receiving raw HTML from the user, then you can pass it through an HTML Purifier. They will deconstruct the HTML and strip out everything you haven’t specifically allowed, which allows you to be very specific with what you let your users use.

This is the one I’ve used before, and it seems to be by far the most popular one on Packagist: https://github.com/ezyang/htmlpurifier

Stripping HTML in Markdown

If you’re receiving Markdown, you should use a converter that includes the option to strip out all HTML when converting. This will save you having to pass the rendered Markdown into a purifier and double handing the data.

The one I recommend is CommonMark, which follows the CommonMark Spec.

It is important to note that part of the spec is that raw HTML is allowed, so you’ll want to read their security page and ensure you’re enabling the security features:

use League\CommonMark\CommonMarkConverter;

$converter = new CommonMarkConverter([
    'html_input' => 'escape', 
    'allow_unsafe_links' => false,
]);

echo $converter->convert('<script>alert("Hello XSS!");</script>');

// &lt;script&gt;alert("Hello XSS!");&lt;/script&gt;

Laravel’s Markdown Helper

Some of you may know that Laravel includes a Markdown helper in the String class (Str::markdown()), but you might not be aware that by default it does not strip out raw HTML.

Laravel uses CommonMark internally, so you’ll can just pass the converter options when you’re converting your HTML:

use Illuminate\Support\Str;
  
>>> Str::markdown('Inject: <script>alert("Hello XSS!");</script>', [
    'html_input' => 'strip',
    'allow_unsafe_links' => false,
]);

// <p>Inject: alert(&quot;Hello XSS!&quot;);</p> 

As you can see, it’s actually fairly easy to validate and clean raw HTML inputs with the right tools.

So you’ve got no excuse. 🤣

1

Counting both free and paid subscribers.

2

31st May will be 9 month from the launch of LSID, so hitting 900 subscribers by then really appeals to my love of patterns.

2
Share this post
Security Tip: Validating HTML & Markdown Input
larasec.substack.com
2 Comments

Create your profile

0 subscriptions will be displayed on your profile (edit)

Skip for now

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.

Ralph M. Rivera
May 20Liked by Stephen Rees-Carter

What is the difference between "validating" and "sanitizing?" I always understood "validation" as the process of confirming that the input data meets a series of standards while "santitizing" is the process of removing unwanted and potentially dangerous data while preserving safe data.

Expand full comment
ReplyGive giftCollapse
1 reply by Stephen Rees-Carter
1 more comments…
TopNewCommunity

No posts

Ready for more?

© 2022 Stephen Rees-Carter
Privacy ∙ Terms ∙ Collection notice
Publish on Substack Get the app
Substack is the home for great writing