The Horton principle is "mean what you sign and sign what you mean". The reference comes from a Dr. Seuss book but does have profound impact. When signing/hashing data, no changes should be possible to it. Although this sounds simple, even a single item is difficult to do correctly. With multiple items, this becomes very complex.
For multiple items with hashing, it's common to turn the data into a string and hash. But, this just moves the problem to the serialization. By using something like ASN.1 or JSON or Protobuf, this mostly solves the problem though. These make the representation unambiguous, which isn't so simple to do.
When doing this for a Merkle tree, a load of problems comes into affect. The depth, the data in the nodes and levels all matter for the tree. In a binary merkle tree, the representation of how we hash the leaves matters. What happens if it's not a perfect element of 2? Do we go from the left or the right? All of these matter for the Horton Principle.
Domain separation is important - add byte separators for specific bytes. For instance, adding a 0x00 for a single entry element. In CometBFT, the hash byte slices are ordered and order-preserving. In Ethereum, they do something different - all unpopulated leaves are just empty. This means that the empty hash is used a lot, making it perfectly balanced in theory. Both of these satisfy the Horton principle.
The Open Zeppelin library takes in an input of leaves of length L. It starts pairing from the tail and takes a double hash concatenation to disambiguate between internal nodes and single leaves. A major deviation is that the leaves are sorted prior to hashing. This means that the tree does not preserve the order of the input sequence. In the documentation, they say it's assumed that the inputs are ordered.
A cross-chain protocol called Omni Network ported the Open Zeppelin version for their own use in Golang. Naturally, they forgot about the optionally of the leaves pair sorting and try to always sort. This is done by the blockchain for transmission but isn't actually required by the verifier. Great.
The Omnichain Network merkle tree data contains a LogIndex that is a monotonically increasing value. However, this value is NOT in the data used for hashing. Combining the optionally of leaf sorting with the neglect of this value in the hash, leads to the ordering of messages not being preserved. Practically speaking, this means that differently ordered xchain.Msg sequences lead to the same Merkle Root.
Even more crazy is that you can change the LogIndex to be different as well. {"Ping", 1} and {"Pong", 2} is just as valid as {"Pong", 1} and {"Ping", 2}. The ordering is not kept, as we can see. The author includes a PoC.
How did this sneak through the cracks of the project? The author has a great quote on this: As seen too often with cryptographic constructions, too many moving parts and options become the ingredients for the proverbial "hazardous material".. With the copying of the porting from the Open Zeppelin with a weird assumption being made in the library, it was bound to lead to an issue. Great write up!