About Project Blog Resources

Resources
People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

How to scan for vulnerabilities with GitHub Security Lab’s open source AI-powered framework- 1913

Github LabsPosted 11 Days Ago

GitHub Security Labs has an agent specialized in finding security vulnerabilities, as described here. Given a set of taskflows defined in YAML, the LLM will perform the actions described. This is meant to be a generalized framework for adding the appropriate context for a codebase, such as defining entry points and user roles. By using smaller tasks instead of a single large prompt, it forces each step to be done more accurately.
The taskflow starts with a threat modeling stage. This drastically helps reduce false positives. This identifies individual applications or sub-components, entrypoints for untrusted input and user actions. By understanding this, the context provided will help with the issue suggestion stage. In the issue suggestion stage, an emphasis is put on the intended usage and risk from untrusted input above everything else. The LLM has the freedom to explore and suggest different vulnerabilities here.
In the issue auditing stage, it takes the suggestions of vulnerability classes and attempts to find issues. This is a lot of complicated prompting, mostly. To avoid LLM hallucinations, it asks for a concrete attack scenario and provide concrete evidence in the source code. Another trick is that is says there may NOT be vulnerabilities in the component. To avoid unnecessary reports, it emphasis's only high severity problems should be reported and to consult the threat model for whether something has impact or not.
After creating this, they decided to run this on several open source web applications. Within Outline they found an authorization logic vulnerability that allowed for privilege escalation. In particular, a lower privilege user with READ/WRITE on a document was able to manage groups on a document. This would have allowed a non-admin document collaborator to grant permissions via groups, including admin permissions.
The next set of vulnerabilities was found was in PHP Commerce websites. In WooCommerce, they found a single way to review all guest orders. In Spree Commerce, they found a way to emuerate addresses and phone numbers on all guest orders because of an incrementing value. I suppose this makes sense; guest checkouts are permissionless so it becomes difficult to scope who has access to them.
The final vulnerability was a funny subtly of TypeScript. When doing password validations using bcrypt in a separate function, it returns a promise. When using this value, it performed an && operation for a boolean on the promise! Promises are always truey. So, the boolean valid was always true when a user had a bcrypt password set. It's interesting that the LLM picked out this subtle bug. This realistically should have been found via testing... a single happy and sad test case would have solved this.
When running the tool on 40 repos, they found 1K potential issues. After the audit stage, 139 were marked as legitimate. After de-duplication, 91 were valid; this was required because they ran the tool multiple times to find more bugs. Of those 91, 20 were straight false positives, 52 were extremely low severity bugs and 19 had meaningful impact to report.
In a table, they show the classes of vulnerabilities that they have found. The most common are access control issues; this feels like a class of issues that LLMs are better at finding than classic static analysis because of the context required to understand it. The tool found several XSS and CSRF issues as well; given the amount of PHP code they scanned, this makes a little more sense. From there, authentication, path traversal and SSSRF had some hits as well. Authentication issues tend to be critical so it's surprising to me that these many issues were found.
From the takeaways, they point out that LLMs are good at finding logical bugs. TMore complicated logic bugs are probably not going to be found for a while, imo. They claim that LLMs are good at rejecting low-severity issues, which reduces the noise drastically. They note that when protections are across different sections of the codebase or are mitigated by third-party things, such as the browser, it struggles. Finally, LLMs are very good at threat modeling. It's able to account for expected usage very well. Overall, a fantastic post on LLM bug hunting!

Maxwell Dulin

About Project Blog Resources

Resources People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

How to scan for vulnerabilities with GitHub Security Lab’s open source AI-powered framework- 1913

Resources
People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!