This blog posts delves into the results of an autonomous Solidity auditor called "V12". It has a UI and makes it easy to interact with via a website. According to them, it performs at or exceeds the level of junior auditors at some firms. It can find many basic programming mistakes, some even missed by various companies. It will integrate with C4, Zellic/Zenith audits, a standalone application and a GitHub Action.
The mission is sane - security is a continuous battle, not a commit hash in time, and products/services should reflect this. Naturally, this doesn't replace an auditing company but it can help the service team in the long term. Finding even simple issues, like access control vulnerabilities, improving the security as a whole.
I appreciate that they include an Evaluation section for bugs they have found. They show several vulnerabilities from previous hacks, such as the 1Inch bug, MonoX hack and a couple of others. The 1Inch bug is slightly deceptive - this was more-so caused by a scoping issue and actually had been found by auditors.
The tool has competed in several live Cantina/HackenProof auditing contests. I find these most impressive, since their was no "taint" potential on the model. These are unique vulnerabilities that others found in a contest.
They also list several historical contests, which could potentially be tainted in the data set. For proper evaluation, the training and test sets must be completely unique. On the other contests they list, they claim V12 found enough bugs that it would have placed well in the competition. 2 out of 2 highs and 4 out of 6 issues are highlights from this section. I'm slightly skeptical about this; was their some tainting of the training data set vs. the testing data set? If this was true then how come it didn't perform as well on live contests it posted?
They also use this on their live audits. Many of the bugs are fairly simple, such as access control issues, reentrancy and bad error handling. They even mention this themselves, which is an interesting analysis. All of these are great things that would work great in a CI setting and as an assistant to a security researcher. As LLMs get better, I think that the vulnerabilities will become harder and harder to discreetly find but also more valuable.
Their perspective on who should use the tool is wise. V12 can enhance the capabilities of a great researcher but should only be used at the end. It's more of an additional layer of assurance and a source of inspiration than anything else. To inexperienced researchers, it's mostly a crutch. I'm curious to see how this plays out.