Tornado Cash is a smart contract cryptocurrency mixer. This allows users at one address to withdraw funds at another address without creating a traceable link between the two addresses. Seems impossible right? This is all about zero knowledge proofs. This is proof that an operation was carried out with giving the secret value. RSA math says I know two secret prime numbers without ever showing you those prime numbers.
The owners of Tornado got sued by the US government because of how well it works for money laundering. A user can put money in and there's no possible way to associate the address on the output. Although, you can associate an address with receiving funds from Tornado cash or depositing funds, you cannot figure out who sent to who.
The naive and not privacy preserving solution is to create two secret numbers and put the hash on the chain. If you can reveal the two secret numbers, then you can take the money out. To make this private, we can go over a list of hashes in a loop by ORing all of them together with zero knowledge proof data. This allows us to find a valid hash without revealing which hash is the one we proved.
Merkle trees are a common data structure used in blockchain because of the ease of verification. So, instead of iterating through a list of hashes, we can use a merkle tree instead. In particular, the withdrawer must prove knowledge of the preimage of a leaf without revealing the leaf and demonstrate that we have a valid Merkle proof for the node.
If we removed a node from the Merkle tree, this would disclose which item we were taking out. So, how do we prevent double spends? The hash from above was two numbers: a nonce and a secret value. The contract code uses a hash of the nonce to determine if this node has been used. So, if you tried to withdraw twice, it would not work. Naturally, this process of knowing the nonce is done as a zk proof. This strategy is called a nullifier scheme.
The Merkle tree needs to be faster to be usable. So, some optimizations have been made. First, the tree has a fixed depth of 2^32 deposits with every element starting off as 0s. Second, the left-most node that does not have a value is overwritten.
The quirks from above have immense impact on how the program works. Since all updates are performed on the left-most updated node, everything to the right is going to be 0. Because of this, they have recomputed all of these values of different depths to make the program more efficient.
The second shortcut is that since we add in a consistent order, we do not have to recalculate the tree all the time. Instead, we can use previously cached values and only update small chunks from the real update. Pretty neat!
The rest of the article goes into the code of the project. I still do not understand how the program performs the zero knowledge proofs but understand why they are being used at least. Awesome article!