Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

uncontained: Uncovering Container Confusion in the Linux Kernel- 1570

Jakob Koschel - VU ResearchPosted 1 Year Ago
  • Type confusions are a bug class that operates in both memory safe and memory unsafe languages. In C, type confusions typically lead to bad memory corruption bugs.
  • The main part of this paper that I enjoyed was around C class and type hierarchies. C doesn't technically have type hierarchies but it's possible to create a similar effect using structure embedding. For instance, you can have a type and then a child field at the end of that type. To go between the parent and child parts of the object you can just do some pointer math.
  • The Linux kernel uses the container_of macro to do this a lot. According to the author, this technically violates the C language standard and is always an unsafe cast. The goal was to find cases where the casting into the container (child) type is incorrect, leading to a type confusion bug they call container confusion.
  • In LLVM, they created a custom compiler pass to spot uses of container_of in the source code to create a type system. This tracks all casts up and down. From there, they built a custom sanitizer called uncontained in order to detect casts up then back down to the wrong type.
  • An interesting design decision was checking at the time of use vs. the time of the incorrect downcast. They found several scenarios where the downcast was safe through only accessing the parent field on the downcast.
  • In the Linux kernel, they found 37 cases of container confusion. Of these, 16 were false positives, 11 were unique bugs and 10 of them were anti-patterns of checking the container confusion later in the small section of code they looked at. Besides simply downcasting to a static container, they found a few other types of bugs:
    • Empty List Confusion: In cases of list being used but empty, both the next and prev fields will point to the object itself.
    • Mismatch on Data Structure Operators: Different locations in good may treat a pointer as a different type depending on the needs. Of course, offsets must be correct in this case.
    • Past-the-end Iterator: Break-like logic is often used by searching for an element in a data structure until the end. It's possible to use the iterator without checking for its validity.
    • Containers with Contracts: An object may come with additional metadata that program semantics use to control what operations can be done on it, such as the sysfs kernel subsystem. If these invariants are not kept, it leads to a mis-use of the pointer.
  • The sanitizier is not meant to have a 100% positive rate. Instead, it's meant to point out potential locations and types of the bug. To me, this is completely reasonable as long as the false positive rate isn't too high. They added all 5 locations to find a total of 80 bugs, 179 anti-patterns and 107 false positives. Most of the false positives came from the first pattern that had explicit tag type checks within the code. Overall, a real bug 30% of the time is pretty amazing!
  • To me, this is absolutely amazing work. Taking a known bug class in the Linux kernel (and some other code) and writing a fairly accurate static analysis tool is awesome. 80+ in the Linux kernel at a time is unheard of in modern days.