Signals in Linux are a mechanism for telling a process to do something. It's a common mechanism for inter-process communication (IPC) put simply. Notably, it's possible to have the code pause at some point (because of a signal) then trigger some other code being executed. This matters since the point in which the signal was interrupted could put the program into an inconsistent state, similar to reentrancy attacks in web3.
The authors were looking at signal handlers in SSH when they noticed the handling of SIGALRM was calling not async-signal-safe code, such as syslog(), when closing a packet after a timeout. This was a regression of a vulnerability from 2006 in a change in 2020 to SSH.
Their initial idea for exploiting this was using free(). In particular, the path would be triggering the SIGALRM while in the middle of free. Then, get the handler to go into malloc with the inconsistent state.
They started by trying to exploit this on a 2006 system, which doesn't have ASLR, NX or the glibc malloc unlink protections in place. The goal of unlink is to remove a chunk from the linked list in order to consolidate the space next to it. If glibc is interrupted at the point where a chunk is free above it but not added to the linked list quiet yet, then the unlink will be attempted on attacker controlled data. This gives an arbitrary 4 byte write, which the author decided to put into the __free_hook of malloc to redirect code execution.
Sadly, this didn't work right away; the race window was just too tight. So, they decided to increase their chances. The DSA parsing code for public keys has 4 places where free is called and sshd allows for six user authentications at a time, giving us 24 free calls to be interrupted at the perfect time within SIGALRM. After a month, it worked but they wanted further optimization. They started to time the presence of the alarm happening - one failure led to it being triggered to early while other showed it was too late.
The original post mentioned an exploit in Kerberos which made them interested in PAM, an authentication module. They found a spot where a cleanup function is not yet initialized but will be soon. So, if they could interrupt the code with an alarm while this initialization was happening then uninitialized data from the heap could be used to control the function pointer. Although this didn't work, they found a similar missing initialization with leftover heap data that could lead to an arbitrary free vulnerability. They decided to use the House of Mind to overwrite the _exit() entry to shellcode on the stack. Pretty neat!
Now, to 2024! The only interesting code to hit was syslog(), which calls into malloc. My first thought was "isn't there a lock on this? However, glibc removed the lock on single threaded code, which makes it not async-signal safe. Within libc, they used the same leftover trick from before. When splitting a chunk into multiple parts, the FREE chunk is added back to the link list BEFORE the new size is set. Since the memory can be controlled from the previous call (it's not cleared), then we can overlap this chunk with our addresses! This is sick because it's relative and doesn't require any knowledge of ASLR.
Their goal was to corrupt a FILE vtable pointer by using the async alarm to corrupt a function pointer within it. Since there are many, many protections on FILE pointers from the years of abuse, this took some pretty crazy object faking but was doable. This took some pretty crazy heap grooming and timing in order to get this VERY specific case to happen at a VERY specific time. This exploit takes about 8 hours to win the race because of ASLR on 32 bit and the timing window. There is no exploit on 64 bit at this time.
Awesome blog post on an RCE in SSH of all things. A fuzzer could have never found this. To me, there are a few takeaways.
- Add regression tests for previous vulnerabilities. If something was written in the past, it is likely to come back via a developer who doesn't understand why something exists.
- Primitives are hard to find but are there! Taking the time to understand the constraints of your bug and working around it can still lead to big results.
- Esoteric knowledge leads to esoteric bugs.