Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

Zenbleed- 1211

Tavis OrmandyPosted 2 Years Ago
  • All x86-64 CPUs have vector instruction registers called XMM registers. Recent CPUs have increased these from 128-bits to 512 bits. 256 bit registers are called YMM and 512 bit registers are called ZMM. Besides number crunching, these are used in many libc calls for string based operations because of their speed and parallelism.
  • The author shows an example within strlen()
    vpxor  xmm0,xmm0,xmm0
    ...
    vpcmpeqb ymm1,ymm0,YMMWORD PTR [rdi]
    vpmovmskb eax,ymm1
    ...
    tzcnt  eax,eax
    vzeroupper
    ret
    
  • The first instruction is setting YMM0 to zero XORing it by itself. The next instruction is using a pointer to our string in $RDI to check which byes match YMM0 and stores the result in YMM1. This is essentially checking if null bytes will match. The vpmovmskb instruction allows us to transfer this to the general purpose register eax. tzcnt finds the amount of trailing zero bits. With 4 instructions, we have successfully found the position of the first null byte of a string!
  • The final instruction is vzeroupper. This is used to zero out the upper bits of the vector registers, which is important for performance reasons. A process has a special location for storing the state of these various registers: Register File and a Register Allocation Table (RAT). The RAT keeps track of what space in the register file is assigned to each register. For instance, when zeroing out an XMM register, the 'z-bit' flag is set in the RAT. So, vzeroupper just sets this flag to release the resources.
  • All of that was background! So, what's the bug? Modern processors perform speculative execution in order to process data faster. It turns out that the vzeroupper does not revert the changes made to the z-bit in the case of branch misprediction. In a way, this creates a use-after-free-like scenario where a RAT mapping has been removed but will still be used after the revert of the state.
  • How do we exploit this? Many of the string operations, such as strlen and strcmp use these instructions. So, we can target a string with these vector registers. To exploit the bug, a few steps must be taken:
    1. Force an XMM Register Merge Optimization to occur. This can be done using the cvtsi2sd instruction.
    2. A register rename, which can be triggered using the vmovdqa instruction.
    3. Mispredicted vzeroupper branch prediction. This is a standard thing to force conditional branches to mispredict for speculative execution bugs.
  • After optimizing the branch prediction, the author was able to steal 30kb per core per second. This can be done within VMs, since it's per core!
  • One thing I was wondering was how the bug was found? Manual review of Microcode? No, fuzzing! In this case, fuzzing is complicated since A) crashes will not occur so we need a different trigger and B) there is no guidance. To get around problem A) the author used an emulator to run the code. Then, they would run the code on the CPU itself to examine the state. If something was different, then it was potentially a bug. To make this more accurate, they added in pipeline instructions like sfence in order to ensure they had full control of what was being executed.
  • To solve the second problem of lack of guidance, the author used performance counters. These special registers store the counters of hardware-related events and there are a lot of them! By using this to guide the fuzzer, it would automatically find interesting paths, which is super neat.
  • Prior to this vulnerability, I had never heard of the bulk of these things and did had not ever considered microcode level bugs. Overall, an awesome write up on something that is out of my zone but the author did a good job making it comprehensible. LiveOverflow recently made a video about this as well.