When hunting for zero days, where do you even start? According to the author, sticking to target is good. In this case, their had been several reference counting bugs recently. When they remembered that this code had similar patterns, they decided to audit the entire flow. Reference counting is done when an object will have more than a single reference to it. Once it has no references, it can be safely destroyed.
Within some cleanup code for MSKSSRV, they noticed some bad code patterns. First, around the process ID. Process context matters because kernel mode code operators in a single address space, where each process has its own user mode context. So, there's some code that checks if the process is the initializing or registering process for an object. The code was cleaning up FS information, which was weird because most Dispatch routines run in an arbitrary process context and DispatchClose() was running in an arbitrary process context. Although this wasn't a bug, it was clearly poorly written code that should be looked at more.
They also noticed several cases where a function can call decrement the reference count more than should be possible. Although this looks like a use after free, there's a higher level object being locked that prevents that from happening. Still, weird code deserves a closer look!
From here, they created a detailed map of the reference counting from different types of processes and calls. Eventually, they were able to come up with a way to trigger a UAF from too many things being cleaned up and pointers not being cleared. So, 3 references can actually there and a single extra one is subtracted, freeing the object. The bug isn't trivial to hit but can be exploited, which is the next post in the series.
The actual vulnerability stems from not setting the FsContext pointer to NULL during the cleanup. Crazy enough, in previous versions of the code, this was there! How did this appear again? The feature flag Feature_Servicing_TeamsUsingMediaFoundationCrashes does not set this to NULL. In Windows 10, this was always on. In Windows 11, this flag was missing, resulting in the vulnerability being present. With the flag turned on, it causes a crash on Microsoft teams, which is likely why they turned this flag off.
The patch checks the thread's token against two specific security IDs that are privileged. The author mentions that there is likely an Admin to Kernel privilege escalation here, as it doesn't fix the underlying memory corruption issue. But, Microsoft doesn't care about this type priv esc so it's not a big deal. A fun article on a Pwn2Own entry!