The author of this post was curious about the various AI-native security scanners. They wanted to find a product on the market that could identify vulnerabilities in code during a code review today. So, they tried numerous products, learned how they worked, and came up with this blog post. Surprisingly, AI security auditors are advertised everywhere but can actually be found nowhere.
All of the products tested had a very similar set of offerings. Full code, branch, PR/R scans. ZeroPath has a SOC2 report generator. Some of them have hooks for things like GitHub actions, bot guidance to developers, response to PRs and IDE plugins, naturally. Finally, they all support auto-fix/remediation guidance as well.
The first step is to ingest all the code and index it appropriately. Once it's uploaded, the context necessary for the LLM to scan and understand the code can be attempted. Extra context for the types of issues to find could be added for scans as well.
The next part is more of the "secret sauce". Asking an LLM to find all vulnerabilities won't be very helpful. So, how does it find the particular code to focus on? The tool could ask for function-by-function or file-by-file analysis. Some use CodeQL permissive queries, opengrep or any other AST traversing of the application. Once it has a candidate vuln, it will perform analysis to see if it's real or not via more detailed analysis.
The final stage involves reporting vulnerabilities, which includes detecting false positives and de-duplication. According to them, the tools didn't report as many false positives as traditional SAST tools. Some of them were better or worse at specific languages. Some were better at particular vulnerability classes.
Gecko and Amplify were very bad with no real bugs found. Almanax was very inconsistent - it would sometimes find basic bugs and other times it wouldn't. It was very good at very deliberate backdoored code though. Corgea found about 80% of purposely vulnerable code that was scanned. It had about a 50% false positive rate which isn't really that bad though. The language made a huge difference on the quality for this tool.
ZeroPath, according to the author, found 100% of the vulnerabilities in the Corpora. Additionally, it identified legitimate bugs in real-world codebases, including curl and sudo. Most of the real-world bugs weren't security issues, but bugs nonetheless. This was the best tool of the bunch.
Some takeaways:
- The biggest benefit is around surfacing inconsistencies between developer intent and the actual implementation.
- The tools were good at finding business logic issues.
- They may replace pentesters in the longterm, or at least supplement them. For things without millions of dollars on the line, they are already a good fit.
I really like the tone of the article and the perspective of seeing the AI as a helper. For instance, mentioning that while the AI does miss bugs, so do humans. The comparisons are realistic, which I appreciate. Good article!