HTTP Smuggling is just a difference in understanding of HTML parsers. What about differences in parsers for other things? The Bishop Fox article dives into differences between JSON implementations. There are several different versions of JSON specs and some spots where the specification isn't tight. Why does this matter? Differences parsers can lead to the same data having different meanings!
The first issue mentioned is inconsistent duplicate key precedence, such as {"test": 1, "test": 2} - one can either take the first or the second in this case. In the given example of a validate-proxy pattern, where one app validates then ships off the original data untouched, this can be problem. For instance, the validation code would see 1 but the actual processing data would see 2.
Next, is key collision via various means. Character truncation is the next class. There are various means that keys or data are altered from the original by removing bytes. The example shows invalid unicode, extra double quote and stray backslash. In the validate-proxy pattern, this can be used to get JSON processed on one side but used on another.
In the same group of classes is comment truncation. Apparently some JSON implementations support comments! Additionally, there are quoteless strings in JSON. Using a quoteless string with a comment, one parser would see it as a comment while the other would just see it as a string. This seems fairly infeasible, as I've never seen any language support either of these features. Apparently, going from Golangs GoJay to Javas library would do this.
It's not just deserialization that can have issues - it's also serialization! For instance, the object obj = {"test": 1, "test": 2}. When using obj["test"], it would return 1. But, when doing obj.toString() it would return 2. Sometimes, reserializing doesn't provide as much protection as you'd expect.
Floats and numbers have their issues too. When numbers are above the max or below the minimum, some parsers do different things. Some large numbers are converted to infinity symbol. Others have very serious rounding. The Go JSON parser will take the large number to change it to 0.
The final group are just random things they found along the way. Some JSON parsers allow for random garbage at the end, which allows for things like CSRF attacks. They found a few segfaults in JSON parsers as well. They even took some time to look at binary JSON parsers, which all had fairly similar issues.
The article finishes off with a list of issues in implementations from various languages. Ranging from Python to Go to Rust. I personally found this extremely useful, as it helps isolate specific bugs in parsers to exploit.
Golang has an interesting doc on their native JSON parsers weirdness even.
How do we mitigate these types of issues? In the parsers themselves, generate errors instead of handling weird things and follow the spec to a tee. For people building applications, it's a little bit harder. Validating and repackaging input instead of validating then passing on the original is a good way to be secure on this. Additionally, the more rigorous the checks, the better. Ensure there aren't extra keys, duplicate keys, invalid characters, etc. on the data. This will help prevent most issues.
Overall, an awesome article into the world of JSON parsers. With how complicated software stacks are today, many of these combinations of parsers are common, leading to major issues. At the end, they reference an article from 2016 called
Parsing JSON is a Minefield that goes over the spec and some additional functionality that they call
extensions. From the list of parsers tested, most of the language built parsers (Rust serde, Golang JSON, etc.) didn't fall victim to any of these issues. I also found
this Github that has graphs and such on all JSON parsers compatibility.