Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

Gregor Samsa: Exploiting Java's XML Signature Verification- 993

Felix Wilhelm - Project ZeroPosted 3 Years Ago
  • While reviewing the Java standard library, the author came across a strange attack surface: a custom JIT compiler for XSLT programs. The reason this looked so juicy was that this was exposed to remote attackers during XML signature verification with things like SAML used for SSO.
  • While most signature schemes work on a raw byte stream, XML signatures operate on a standard known as XMLDsig. This protocol attempts to be robust against small changes to the signed document, such as white space changes and other things.
  • The signature appears in its own XML tag within the document with many special fields including the information that is signed, information about the key and many other things. The verification is done in two steps: reference validation (transforms on the document itself) and signature validation.
  • When looking at the transforming supported by XMLDsig, there is a format called Extensible Stylesheet Language Transformations (XSLT). This is a programmaging language with the purpose of modifying an existing XML language. It can do things like request remote data, edit the document itself and do other things.
  • In Java, the signature verification is done then the transformations occur. Java will iterate through each of the transformations and perform them on the XML document. This calls a module called XSLTC, which is a XSLT compiler from the Apache Xalan project.
  • The compilation takes in the XSLT stylesheet as input and returns a JIT'ed Java class. The JVM loads this class, constructs it and runs the executable as code. The library depends on the Apache Byte Code Engineering Library (BCEL) to dynamically create the Java classes. Constants get stored in the constant pool and smaller integers get stored on inline.
  • The constant pool is only 2 bytes in size. However, neither XSLTC nor BCEL considers this constraint, leading to an integer overflow at 65535 entries. When JIT'ing the program, BCEL writes the internal class representation with all of the constants but the length is truncated.
  • Practically, an attacker has the constant_pool_count at a small value, meaning that the rest of the pool will be interpreted as method and attribute definitions. The pool starts with a 1 byte tag describing the type of constant, which is followed by the actual data. How do we exploit this though!?
  • There's no dynamically sized value with complete controlled content. Although there are strings, they are in a modified UTF-8 without nullbytes. The field CONSTANT_DOUBLE can be used to create floats with nearly arbitrary content. This gives quite a bit of control but every other byte is 0x6 still because of the field directly AFTER the constant pool.
  • To make this work, we need to spoof the metadata fields after the pool properly. With deep knowledge, about the fields (and much trial and error) Felix had a good way to spoof it using a combination of UTF-8 entries and doubles. With the initial headers made, we can get to the interesting part of the class file definition: the methods table - the methods and bytecode definitions.
  • To add the proper code, we need to align ourselves properly to create the code for a constructor. With this, we can set the bytecode of a function that will be executed. Additionally, to reference classes in Java, we can include an XSLT snippet. Boom, code execution!
  • The Java SDK will verify then run. So, what's the big deal? If an attacker can set their own key, which is particularly common with multi-tenant SAML, then this code path can still be hit. Additionally, secureValidation forbids the usage of XLST transformations in code but is turned off by default.
  • Overall, an amazing post. A few good things I learned from this:
    • Studying technology for unknown but powerful attack surfaces is worth the effort.
    • Error tolerant systems are much easier to exploit. Things with strict exits on errors are harder to do with limited primitives
    • Memory safe languages still suffer from many problems, including issues with integers, as shown in this post.