Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

Sanitize Client-Side: Why Server-Side HTML Sanitization is Doomed to Fail- 1613

Yaniv Nizry - Sonar SourcePosted 1 Year Ago
  • Cross site scripting (XSS) is a super common web vulnerability. If a user can include HTML into the page, then you can commonly add your own JavaScript to perform malicious actions. Sometimes, some HTML should be allowed for styling. Because of this, HTML sanitizers are super important for preventing security issues in these cases.
  • These sanitizers work by parsing the HTML input to create a structured DOM tree object. Then, parsing this DOM to ensure that nothing defined as malicious exists. This HTML sanitizing should be done on the client side in order to prevent parser differential issues. In reality, it's done on the server-side quite a bit.
  • Sonar source has found a lot of sanitizer bypasses in the past. They noticed that a group of them worked on almost all of them written in PHP. All of the bypasses were relating to comments, math, RCdata and RAWData. All of these are new HTML 5 features!
  • The built in PHP HTML parser uses an out-of-date HTML 4 specification from libxml2. If the parser used for cleaning was the same as the execution (being in the browser), this issue wouldn't have existed. It's just a standard though, how hard can this really be?
  • HTML is a constantly evolving language. New elements, attributes and features are regularly introduced. Different users are also running different versions of browsers, which causing some complications here. The author claims that the parser configuration can make a big difference as well. If scripting is enabled or not can determine how some elements are parsed.
  • Another issue surrounds parsing weird HTML. If something goes through a parser multiple times in a loop, the output may be different in the two cases. Additionally, mutation XSS can be used too.
  • The issue around PHP was never fixed. Instead, this big PHP library now has a big red warning label that it shouldn't be used for sanitizing because it doesn't support HTML5 very well. Overall, a good post around a bad practice and an interesting vulnerability in the improper usage of a library.