The main Solidity code generator had a compiler bug in the intermediate representation (IR). This is the story and impact of the bug from versions 0.8.28 and 0.8.33.
The IR pipeline generates reusable Yul helper functions during code generation. These helpers are deduplicated by name, so there can only be a single Yul function per name. The helper responsible for clearing a storage slot (delete) derives its name from the Solidity type being cleared. For instance, storage_set_to_zero_t_address for the address and the zeroth slot.
This is where the bug lives: the name does not include the type of storage - persistent or transient. Since there can only be one, the clearing operation encountered by the compiler determines the implementation used. There are two cases for this: transient when expecting persistent and persistent when expecting transient. Notably, this writes data to the wrong slot!
The article gives an example of the persistent being used when it should be transient. In this case, the owner address is in storage slot 0 and a locking address variable in transient slot 1. Upon setting the transient lock, the owner value is set unintentionally.
The other example is the opposite type. The contract has a mapping of uint256 to address as persistent storage and an address for the caller in transient storage. This code has an approval, a revoked approval, and a run to demonstrate the issue. In this case, if you try to call revoke(), it simply will not work. Neat!
The Solidity team's severity assessment seems fair. First, projects that run their test suite in --via-ir before deployment would have noticed this behaviour. This feels off at first, but I definitely don't trust the compiler, so I've tested in multiple settings like this before. Following the bug report, they found three affected contracts, which were notified and fixed.
There's a separate
blog post from the discoverers of the bug, Hexens, about this. While the Solidity team thinks this was low impact, Hexens believes it was critical. I think that assessing the impact of developer tools is hard. Is the impact the
worst possible impact or the impact + likelihood? If you go off the first, it's really bad; if you go off the second, the overall impact isn't very bad. The transient keyword isn't too popular yet; this would have been more likely in the future. In the case of the Vyper reentrancy bug, which affected a wide range of contracts with very concrete security implications.
Overall, a great blog post on the reason behind a compiler bug and the issues that resulted from it. I appreciated the concrete examples. The post doesn't mention how the bug is found which I would also like to know.