The vulnerability is a complete lack of input sanitization when going to kernel functions. This leads to a linear heap overflow, a TOCTOU race condition, an out of bounds read with a controlled index and an arbitrary add primitive. These are all very serious bugs where any one of them could take control of the process.
The heap overflow looks like a good primitive because we can overflow values at the end of the ION buffer allocation in kernel space. This is done by winning a race to change the size of the array AFTER the size of another value has been chosen. This OOB write has mostly complete control of the buffer.
The OOB addition can add a single value to any point. This is important because it may be possible to increment a pointer to an arbitrary value. This is done by setting an offset to find an array of address_vector structs. The offset has no bounds or alignment checks. This offset does have a constraint in order to perform the OOB write though.
The next section is rather instead. The author goes into HOW this could be exploited on iOS first, then dives into Android. The author hypothesizes the exploitation on iOS and references several CVEs.
The author chose to go with the arbitrary add primitive because the location of the data was next to kernel thread stacks! Using this, we could overwrite a size parameter for a syscall in memcpy in order to cause further memory corruption.
The next problem to solve was HOW to put a thread stack at this location? The main pain point is that on Linux, kernel memory is placed into a unpurged_vm_area and eventually placed back into the usable heap memory. In order to combat this issue, the author spammed a bunch of allocations of binders to create arbitrary sized mappings to flush the cache. Then, spammed a bunch of threads. With a little luck, the thread stack was aligned in the proper place for our overflow.
Something that I was not familiar with is blocking. Blocking is when the execution of a program stops then re-executes later on, after some criteria has been met. This is useful for this primitive because the contents of a blocked thread appear on the thread stack.
Once the thread stack was in the proper area, the goal was to get a nice memory leak. This was done by first causing a page fault in one thread. When this page fault occurs, the fault sends data back to a thread stack which will send some errors back to userland. By altering a thread stack variable (the size of the data), an arbitrary memory leak can be performed to leak stack addresses, canaries and so on.
An additional idea (for the arbitrary write) was to overwrite a threads SPSR (saved process status register). Doing this would change many parts of the execution environment, such as whether this thread was from user or kernel space. Unfortunately, some quirks of this technique caused issues too terrible to work around.
The final solution (for the write) was to corrupt the value of 'n' of a file descriptor counter. Using this, an OOB write could occur, on the stack, to gain control of a RET address. Because this primitive allowed for an indexed partial write, the RET address could be changed without editing the canary! This came with several complications, but read the article on how this was handled.
With the ability to overwrite a RET address with 15 64-bit values on the stack, it was time for the ultimate ROP chain. The BPF (Berkley Packet Filter) is a subsystem of Linux that allows for arbitrary commands to be ran. By using this, an arbitrary read/write/execute primitive could be created by calling this from the ROP chain.