This post details the discovery and exploitation of the Anza Solana validator to achieve full remote code execution (RCE) on the validator written in Rust. Besides the vulnerability description, they discuss a lot of background (that I will skip) and the process of hunting for the bug, which is probably the most interesting part to me. Solana is extremely optimized, which comes at a cost in terms of security. For significant speed enhancements, such as memory address translation, it conceals a substantial security risk. For this reason, the team was closely monitoring new changes and features that looked dangerous.
In Solana, all account data had to be copied directly into the VM, which is very slow. So, the feature Direct Mapping was born to allow account buffers to be directly mapped into the VM memory rather than copying the entire data to the execution runtime. Dealing with raw pointers is very scary, which is why the authors of this post decided to keep looking at it. New code is the most likely to have bugs in it.
In Solana, these accounts have very important permission boundaries, such as only the owner of an account being able to modify it. Originally, these were validated post execution or when creating a call for a CPI to a local version of the data before being written to the global account cache. This is an important invariant to consider later.
Direct Mapping had the account.data point directly to the hosts buffer for the Solana memory region. Because everything is now using a shared pointer instead of a personal pointer, the validation must happen on each write called copy-on-write. This changes a key invariant of the system. Originally, all of the data was directly read from the underlying DB implementation. So, a copy was added to memory and updated once it was written to.
Direct Mapping also needs to consider situations where account sizes change. To make this infrequent, there is an overallocation that occurs. When CoW operations relocate data buffers during CPI execution, the original MemoryRegions structure for the previous call still points to the old buffer. To do this, it grabs the vm-data_addr to find the memory region of the original mapping to eventually update it.
The vulnerability lies in bad input validation from this process - the CallerAccount.vm_data_addr is stored completely in the VM's heap memory. By modifying the AccountInfo.data pointer in VM memory before triggering a CPI call, an attacker can forge an arbitrary vm_data_addr value. This causes the wrong memory region to have its host address updated, being mapped to an arbitrary location in virtual memory.
Their original exploitation method was to break memory write authorization checks - a core invariant of Solana. Their first PoC made an account writable that should not have been writable, leading to a major loss of funds bug on all Solana programs. Upon pulling the newest code, the attack no longer worked because of a fix to an unrelated bug. The exploit was killed because the CPI account being set to ReadOnly.
The core issue still isn't patched but we just need a new exploit strategy. They decided to approach this with a binary exploitation-like approach! Accounts with different sizes could now be mapped over the top of each other. By using this fact it's possible to read or write out of bounds on the directly mapped buffers in host memory. This is limited to the range of the other account that we're using though, limiting the size of the read/write.
They came up with a way to take this to an arbitrary write that required 3 accounts: small buffer (SWAP), large buffer (LEVERAGE) and the exploitation account (POINTER). Here are the steps:
- Trigger the vulnerability to map the SWAP address over the top of the LEVERAGE address. This will allow us to read/write OOB on SWAP.
- Hunt for the POINTER account within the
MemoryRegion. By setting a simple value on the account data, we can locate it after some searching.
- Replace the POINTER
host_addr and set its state to writable.
- Write to POINTER at any location in memory that is desired. With arbitrary read/write, this is a basic exercise to get RCE.
I really really really like this post. It contains methodology on why and when they were looking at this section of code. It describes a deep understanding of the core invariants of the software and how they found a bug that breaks the invariants. Finally, their multiple attempts at exploitation were nice to see as well. This is one of the more "real" write ups that I have read on bug hunting. Solid work!