The main target was Instagram. But, instead of targeting Instagram itself, they went after an Open Source library called Mozjpeg or Mozilla JPEG encoder.
Checkpoint setup an awesome fuzzing lab, just using the standard AFL (American Fuzzy Loop) to do the job. From one day of fuzzing on 30 CPUs, they found 400 'unique' bugs.
The first CVE, is a pretty simple integer overflow in the dimension parsing. When finding the amount of dynamic memory to allocate, the following is called malloc(width*height*output_component), where output_component is just a bunch of values depending on the type of image encoding. Both the width, weight and output_component are alterable by the use.
Because both width and height are 32 bit integers, the call writes only 32 bits into a register for the size! This means that we can cause an integer overflow and allocate very little data, while still being able to write a substantial amount of data! Wow, what a great bug :)
The data, from the image, is then copied in line by line (width*output_component with height as the maximum index). However, how do we exploit this? This LARGE copy is known as a WildCopy bug and can be hard to exploit, as there is an immense amount of data being written.
There are three questions mentioned for exploiting WildCopy Bugs:
- Can we control (even partially) the content of the data we are corrupting with?
- Can we control the length of the data we are corrupting with?
- Can we control the size of the allocated chunk we overflow?
In this case, the data was straight from the image (controllable), the
height is used as an iterator for
width * output_component and we control the size COMPLETELY.
How do we exploit this though? There are a few ways to go about this large editing:
- Race condition on heap allocations to other threads
- A STOP condition, prior to hitting unmapped memory
- Overwrite function pointer that is used on each iteration of the loop
In this situation, this was using one thread and they couldn't find a stopping condition. But, they did find a function pointer (on the heap) being called on each iteration:
jpeg_read_scanlines.
The function jpeg_read_scanlines calls process_data_simple_main, which then ALSO has two function calls to virtual function pointers within it. The cinfo struct ALSO holds many, many function pointers. From one of these pointers, we can override a function pointer to get code execution.
After this, are some details on how memory is managed in both jemalloc and Mozjpeg. All in all, fancy heap grooming is done in order to put our cinfo struct directly after our integer overflowed pointer.
In the end, they mention ACTUALLY putting this into use. Different file format have slightly different quirks and things... Dealing with the customization from Instagram was a hurdle they were not able to climb. Although, this appears to be exploitable, with some more effort.
Some takeaways/thoughts:
- Wild Copy integer overflows are exploitable, you just need some very special circumstances to line up in order to make it possible.
- Good explanation of the heap grooming; definitely a good thing to read on your own.
- Simple bugs live in the most important things. Just find a target and attack :) Don't be shy!