The bug report
CVE-2022-42703 by Jann Horn is a use after free on
struct anon_vma in the memory management (MM) subsystem of the Linux kernel.
The vulnerability is extremely complex and particular to the subsystem. From reading the bug report, there appears to be an incorrect assumption made on reference counting for VMA objects that trickles down into other portions of code. This bad logic leads to a use after free.
By triggering the vulnerability above, the object folio->mapping can get a dangling reference to a anon_vma object. By calling madvise(..., MADV_PAGEOUT), the access on anon_vma can be repeated in the free state.
Within the structure, are several pointers. The route the author decided to go down for exploitation was to fill this with addresses and corrupt them. The function down_read_trylock() would corrupt the memory at a chosen address after some primitive hunting.
To get this to work though, we need to be able to supply our fake structure. Since anon_vma belongs to its own kmalloc cache, it's not simple to free and reclaim. The author points to a known technique to free all of the objects in the slab page, flush the percpu freelist and cause the virtual memory to get sent back to the regular allocator. With a spray, we can control this.
The arbitrary write had constraints on the write that would occur. It would increment the value by 0x100 if the 3 least significant bits and most significant bit were set. In the future, the value will be decremented back down, meaning that this has some limitations. We also don't know the KASLR slide, making this even harder.
On x86_64 Linux, when the CPU performs interrupts and exceptions, it will swap a respective stack that is mapped to static and non-randomized virtual addresses. This has been exploited in the past in order to exploit something to not need knowledge of the KASLR slide.
What's the game plan then? Force an interrupt to occur. Once this happens, we can use our arbitrary write to corrupt the registers in the stack frame in the kernel context. The author choose to interupt a call to copy_user since the data is controllable and there is a length value in a consistent register (RCX) that we can overwrite.
It turns out, that there is an
interrupt for hardware breakpoints that is easy to trigger. So, the author wrote code to trigger this exploit method:
- Setup two processes: X and Y. Y is the original and X will ptrace Y.
- Set a hardware breakpoint at a known address in Y.
- Make a large number of
uname requests. This is because copy_to_user is throughout.
- Trigger the breakpoint in the code for Y. This causes the location to be saved for the codes stack frame.
- Slightly after the frame is saved, use the UAF to write to process Y's stack frames saved length value.
The technique above works for reading out too much data. For reading a stack buffer to userland, this can be used to defeat the KASLR slide and stack cookies. If the technique is inverted, it can be used to write too many byes to the kernel as well. The target of this was prctl to create a stack overflow where none existed in the past.
Since the read leaks the stack cookie and KASLR slide, it is trivial to bypass both mitigations! Now, we can start a ROP chain. To mitigate this, they suggested randomizing these areas of memory. They do note that the mitigation doesn't work for local users (only remote) since a TLB timing side-channel can be used still.
Overall, interesting technique on Linux exploitation. It's cool to see such a powerful primitive discovered that can be used in other locations.