The CTF challenge is running on a Raspberry Pie. The binary is a multithreaded C++ application for taking the length of a string. You can view or delete the results of this operation as well. The two main data structures are the request struct with a create+delete operation, as well as a thread safe ring/queue buffer that can be read and written for the strlen operation.
How it is thread safe? Instead of using locks, while loops are used in order to ensure that the queue is properly used. For instance, if the ring buffer is full, the producer will stay in an infinite loop until the ring buffer is not longer full. This same type of pausing is used on the consumer end when the queue is empty by checking if the head and the tail are the same.
CPUs are insane with all of the operations that they do. CPUs will commonly rearrange instructions, as long as the operations don't affect each other, in order to perform actions in parallel. In x86, the memory reordering is strict enough where nothing happens. On ARM, this is not the case.
Memory loads and stores can be reordered, assuming that the values down the line are not affected. In the example shown below, thread 1 could set the value of ready and value. However, since these could get reordered, the read on 4 may happen prior to the initialization in 1 even happening! Wow, that's wild.
volatile int value;
volatile int ready;
// Thread 1
value = 123; // (1)
ready = 1; // (2)
// Thread 2
while (!ready); // (3)
print(value); // (4)
Now, back to the program! In the ring buffer (a circular linked list), the consumer operation waits in a while loop for the head pointer to NOT equal the tail. In order to update this, the producer will remove the entry then sets the head to be the next element. The order seems fine, so what's the problem? CPU reordering!
If the CPU reorders the writing of the head pointer PRIOR to writing the pointer into the ring buffer, then our consumer thread can break out of the loop to consume the value. Since the pointer was never written to the queue (with the reordering), we will have access to a stale pointer from the previous usage of the slot in the ring buffer. There is a really good visual for how this works in the article as well.
This attack requires 20K requests in order to work. To make this work, we have an identifier on the string get updated, making it obvious when something has been corrupted. At this point, we have a pointer that points to the same object twice though!
The binary is compiled with partial RELRO (corruptable GOT entries) and no PIE, meaning that this would be a good target to overwrite a function pointer. Overwriting a function pointer (such as free) to point to system, is a wonderful way to exploit this.
To get an information leak, we can simply free one of the pointers we have access to. To turn this into an arbitrary read, a pointer in the chunk can simply be overwritten and point at an arbitrary location. Simple enough!
To turn this into a write primitive, we can abuse the second free! By writing to the location of the 'fd' pointer, we can trivially point to this any location and get the chunk back out later. GLibC Malloc does have quite a few mitigations though.
The double free protection can be bypassed by putting a chunk into the fastbin first, as the double free protections between the bins are not synced up. I love this trick! Overall, great write up! The bug was wild and I loved seeing into the mind of the exploit developer. Good work!