Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

WorstFit: Unveiling Hidden Transformers in Windows ANSI! - 1582

Orange TsaiPosted 1 Year Ago
  • Windows supports Unicode for strings, now-a-days. This article discusses the evolution of string encodings on Windows and the requirement for backward compatibility. Originally, Windows used ANSI encoding. This relied on code pages for languages depending that did not fit within 8-bit ASCII. These code pages were specific to a given language so a Taiwanese message going to a Japanese computer would be rendered differently.
  • In Windows, there are actually two types of code pages. ANSI code pages (the focus of this article) and the OEM Code Page. Eventually, Windows moved over to UTF-16 which uses 16-bits for most characters and 32-bit for rarer ones. While making this change, Windows switched to wide characters UTF-16 on many of their APIs. To remain backward compatible, there are two sets of APIs: A for ANSI and W for UTF-16 wide.
  • The main focus of this post is when wide characters are passed into the ANSI APIs that doesn't exist in the existing code page. Instead of erroring out, the code attempts to do a best fit match back to the current ANSI code page. For instance, the infinity character gets mapped to 8 on code page 1252. Different languages have different quirks. To test this out, they created a tool to show off this functionality. The goal is to abuse this "best fit" feature in order to trick programs on Windows to do weird things.
  • The first instance they found of this was when using the PHP-CGI server. The original vulnerability (from 2012) demonstrated that adding a dash (-) to a query parameter could be used for argument injection, eventually leading to code execution. Using this same exploit method and our "best fit" trick, we can do the same. The URL query parameter ?%ADs will translate into a - on Chinese and Japanese computers. I remember reading the report yet had no idea why this mapping happened. I investigated why this happened but never came to a good conclusion on why. Now I do!
  • What else is affected by this? The Yen and Won symbols on Japanese/Korean Code pages will map to / and \ respectively. Since these are interesting characters for directory traversal, it could be a useful exploit. They found that the Cuckoo Sandbox could be escaped using this technique. The system saw the string as having same characters but the file access APIs in Windows did the "best fit" mapping under the hood.
  • The next target is command line arguments, similar to the PHP bug. In PHP, the function escapeshellarg() is the standard way to prevent command injection and argument injection. In Python, subprocess executes the command after doing some escaping. Under the hood, this will call into CreateProcess with the quoted parameters. If you can control ANY part of the data in the command, then U+FF02 (a full width quote) can be used to bypass this. This is because the functions don't escape it, but the system does the best-fit mapping BEFORE calling the executable.
  • This same attack can work by injecting a \ to remove the escape of another parameter. For instance, using the Won sign to add in a \ alongside a ", leads to the escape of \" on the double quote. Once the best-fit happens on the Won sign, this turns into \\" to void the escaping. They mention that argument splitting via spaces and tabs is fruitful using other characters as well. Neat!
  • ElFinder, which is a PHP application, could be used to pop a shell on by using the tar.exe command with the argument injection. The Open-With feature has a handler table in Windows. Since the filename is part of the argument, it becomes an attack surface. On Microsoft Excel, renaming this file to an argument-splitting payload leads to confusion in the interpretation. This leads to adding arbitrary arguments to excel.
  • You're not safe even if your program is just a simple C program! Using int main will default to the ANSI API usage to get the arguments and environment variables for the call. The compiler adds this in other the hood. A user could also specify wmain if they wanted to remediate this. Environment variables were a huge issue on this as well, leading to LFI and a WAF bypass in some PHP things.
  • Disclosure of these bugs was difficult. Developers thought the bug was in Windows while Windows said they needed to maintain backward compatibility. You can use the beta UTF-8 package on Windows as a user. Additionally, use safe APIs instead of shell commands when possible. Is the dawn of a new bug class on Windows? It appears so.