The Merkle root construction of Bitcoin takes a double SHA256 hash of each transaction as a leaf. Then, each hash is concatenated and hashed until we reach the top. If there are an odd number of leaves at a level in the tree then the final element is duplicated.
CVE-2012-2459 exposes an issue with the odd number part of this. This is best done by example: let's take a tree with [1,2,3,4,5,6]. After [5,6] is hashed, it will need to be duplicated then hashed. Functionality, this is the same as [1,2,3,4,5,6,5,6]! There is a really good image of this in the paper.
Another issue is the lack of domain separators between nodes and leaves. A non-leaf node in the Merkle tree is a hash of a 64 byte input, which is the same as two hashes of its children. Meaningwhile, a leaf node is the has of a transaction. If the serialization of the transaction is 64 bytes, it can be impossible to distinguish between a leaf node and a non-leaf node.
With this knowledge, it begs the question: how much work is required to produce a 2 transaction block such that the merkle root is the hash of a 64 byte input? After showing which bytes are controlled and which are not, this requires about 8 bits of work in the first 32 bytes and 22 bits of work on the final 32 bytes, which is doable!
An SPV proof provides the client with a path through the Merkle tree; from the root down to the transaction. The SPV client also does not know how many TXs are in the block. So, if we can construct a transaction such that the second 32 bytes collide with the hash of a fake transaction, we can include this fake TX in the block.
Although this seems crazy initially, it's not. It's reasonable to have the second 32 bytes of a TX (with much control over it) be the output with a pre-calculated hash that was made. As a result, the TX we WANT to include will be included unintentionally. This requires 81 bits of work, which is much less than the 128 bits expected.