About Project Blog Resources

Resources
People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

Agave Network Patch: Root Cause Analysis- 1475

Anza Reference →Posted 1 Year Ago

Agave and Jito are Solana validator clients. Solana executes eBPF bytecode from an ELF file when being executed.
The development toolchain aligns the ELF program. During the ELF sanitization process for uploads, there is no alignment check. When calling CALL_REG opcode, it assumes a jump aligned to an instruction boundary. However, with the misaligned code, it causes the VM to jump to an invalid address, crashing the node.
This vulnerability could be exploited by an attacker who writes a program that executes the CALL_REG opcode, manipulates the programs ELF file to misalign its .text section and finally deploys and invokes this program on the solana network normally.
To deploy the patch, a super majority of the network is needed. Core contributors privately contacted large validators with the patched code. Once the network was safe, the updated code was put on Github.
The article was okay. I wish more code snippets and explanations on how the VM works.

Weaknesses in Bitcoin’s Merkle Root Construction- 1474

Linux Foundation Reference →Posted 1 Year Ago

The Merkle root construction of Bitcoin takes a double SHA256 hash of each transaction as a leaf. Then, each hash is concatenated and hashed until we reach the top. If there are an odd number of leaves at a level in the tree then the final element is duplicated.
CVE-2012-2459 exposes an issue with the odd number part of this. This is best done by example: let's take a tree with [1,2,3,4,5,6]. After [5,6] is hashed, it will need to be duplicated then hashed. Functionality, this is the same as [1,2,3,4,5,6,5,6]! There is a really good image of this in the paper.
Another issue is the lack of domain separators between nodes and leaves. A non-leaf node in the Merkle tree is a hash of a 64 byte input, which is the same as two hashes of its children. Meaningwhile, a leaf node is the has of a transaction. If the serialization of the transaction is 64 bytes, it can be impossible to distinguish between a leaf node and a non-leaf node.
With this knowledge, it begs the question: how much work is required to produce a 2 transaction block such that the merkle root is the hash of a 64 byte input? After showing which bytes are controlled and which are not, this requires about 8 bits of work in the first 32 bytes and 22 bits of work on the final 32 bytes, which is doable!
An SPV proof provides the client with a path through the Merkle tree; from the root down to the transaction. The SPV client also does not know how many TXs are in the block. So, if we can construct a transaction such that the second 32 bytes collide with the hash of a fake transaction, we can include this fake TX in the block.
Although this seems crazy initially, it's not. It's reasonable to have the second 32 bytes of a TX (with much control over it) be the output with a pre-calculated hash that was made. As a result, the TX we WANT to include will be included unintentionally. This requires 81 bits of work, which is much less than the 128 bits expected.

Bucket Monopoly: Breaching AWS Accounts Through Shadow Resources- 1473

Nautilus Reference →Posted 1 Year Ago

S3 buckets are file storage on AWS. AWS eats its own dogfood a lot, meaning that many of the AWS services will use S3 under the hood.
In Cloud Formation (infrastructure as a service), an S3 bucket was created that was per region when CF was turned on in the region. This was a contains all of the user defined templates.
S3 is a global AWS service, unlike most things at Amazon. So, there is a classic vulnerability called bucket sniping. This is all about registering the bucket before someone else does, then having access to it once the service starts using it.
By creating the bucket in account B, when account A setup CF in that region, account A would use the user controlled bucket. To make this viable, the public must be made public and have a very generous resource policy that allows account A access.
At this point, the template can be modified to add resources in the account! The authors mention a TOCTOU issue on Cloud Formation from previous research to add in the resource but not strictly needed. To exploit this, do the following:
1. Claim the bucket with the predictable name of the victim.
2. Create a lambda function to execute upon bucket access from the victims account. This
3. Wait for them to start using CF in that region.
4. In the lambda function, modify the CF template that was written before execution.
5. Run the backdoor templated file. This will create an admin role controlled by the attacker that can be assumed by them, giving them admin access to the account. To do this, the executor of the CF template must have IAM permissions.
The bucket has a random value associated with it. For whatever reason, they were unable to figure it out. Via OSINT on Github, it seems feasible to expose the hash.
From there, they wanted to find other services vulnerable to the same issue. They found AWS Glue, EMR, SageMaker, CodeStar and Service Catalog vulnerable to the same attack. In terms of impact, sometimes it was RCE on the service itself or data manipulation.
Many AWS services run with the execution of the user itself. Since this is the case and the bucket permissions are open, this exploit makes sense. However, I'm fairly confident this is a standard thing that AWS engineers should test for, making it surprising how many cases of this were found, especially in such big services. Overall, great post though!

Front-End Frameworks: When Bypassing Built-in Sanitization Might Backfire- 1472

Stefan Schiller - Sonar Source Reference →Posted 1 Year Ago

Modern JS frameworks like react, Angular and Vue safeguard against XSS. If you want to include input as HTML, there are mechanisms to do this but are dangerous.
Vue.js uses the mustache template syntax to do this. Additionally, adding the v-html attribute can done as well.
In Firefly III, they spotted an issue where a web request response was using the unsafe HTML rendering. At first, it's not a response controlled by the attacks. However, a web request was using user input for an ID to make a request. Hence, path traversal was possible but only on the client side.
The author got somewhat lucky here - they found a field with the same key that was being reflected with data from the request. Hence, the traversal led to XSS.
The page had a good CSP preventing big attacks. Since this is supposed to be raw HTML, they couldn't just remove the tag. First, the ID is parsed as an INT. Next, no dynamic data is returned. Overall, a good find and an interesting use case for client side path traversal.

Android Jetpack Navigation: Go Even Deeper- 1471

Artem Kulakov - PTSwarm Reference →Posted 1 Year Ago

Jetpack Compose is a new way for building UIs in Android, replacing the fragments style. Now, navigation between screens represents composable functions. Hence, the Jetpack Navigation library is used for navigating users between screens as well.
A developer can do this using deeplinks in Android. In previous research, it was discovered that somebody can route to arbitrary pages on the application if these are controlled, even if the app doesn't support any.
The JackPack navigation library has some implicit deep links. Internally, it will assign deep links to each created route that the dev isn't even aware of. As a result, a malicious application on the device can execute the handler.
The recommendation to the users is to simple NOT use this library. An example exploit was bypassing a pin screen on the app but force browsing to a different screen. Good post!

Why ORMs and Prepared Statements Can't (Always) Win- 1470

Thomas Chauchefoin - Sonar Source Reference →Posted 1 Year Ago

Soko is Go software for publishing Gentoo Linux packages. It uses an ORM which should in theory make us safe against SQL injection attacks.
However, the code authors were misusing the prepared statements API. Instead of having the ORM do the SQL query mapping, they were concatenating user controlled data directly into OrderExpr. As a result, the escaping wouldn't be done.
This leads to a trivial SQL injection within search functionality, leading to arbitrary database leakage. The package also supported stacked queries! This allows for the finishing of a query to start a new SQL call. The COPY FROM PROGRAM feature to execute arbitrary code on the system.
The feature for RCE is a privileged entity. However, since it's run in a Docker container, the executing user is root, bypassing these checks. It's interesting that using as root in a docker container had some serious consequences. Overall, a good and snappy post on finding SQLi in weird places.

Threshold Transaction Malleability Bugfix Review- 1469

Immunefi - Kayaba Reference →Posted 1 Year Ago

The Threshold Network is a collection of various services that use threshold cryptography by relying on multiple secret keepers. One of these services is tBTC that bridges native assets. The mechanism used the Bitcoin merkle root is busted.
The mechanism uses Simplified Payment Verification (SPV), which is a light client verification for Bitcoin. When SPV needs to verify a transaction, it only contains the merkle root and block hash for verification to verify that a given transaction is in the tree. This is similar to how IBC works.
SPV doesn't include the amount of transactions. The hashed values are 32 bytes long and the transactions are 64 bytes long. To get parent of a transaction, we hash it. To generate a non-leaf node we concatenate the two hashes together then hash it.
Because there is no tracking of the count and no delimiters between the data, it's possible to trick the system to think that a raw transaction is there when its actually not. This is done by adding an extra transaction below a node (which gets hashed) to the value we want.
Within a transaction, the information is mostly random but many of them are controlled by the attacker. This would require a lot of grinding to do but is feasible.
The developers knew about this exact issue and it's even in Linux documentation. However, the developers deemed it unexploitable because it's only doable with transactions that are 64 bytes in size, which most were not. The bug hunter realized that this was NOT the case though - coinbase transactions could be used.
A malicious miner could create a 64 byte coinbase transaction that would be accepted by the network. To fix the issue, a length check was added and actual validation of the coinbase proof itself.

Ambush Attacks on 160-bit Object IDs and Addresses- 1468

Mysten labs Reference →Posted 1 Year Ago

If SHA256 was ever broken, much of the world would break down. This is especially true of many blockchain protocols as well. Why is this bad?
Contracts could be deployed to the same address, resulting in funds getting stolen. Geth has a fix that makes this impossible, but still interesting. Bridges using Txn ids would only see one transaction vs. two. There are application specific reasons why this would be bad.
Most things use 256 bits, such as object IDs in Sui. However, even Ethereum addresses are 2 ** 80 or 160 bits of security. The authors show that the cost of this would be in the range of 1-10 million dollars. They show some math from Bitcoin hashing profits and Facebook research. Anything under a billion dollars is a real threat.
What makes a hash function secure then?
1. Preimage resistence. Find an arbitrary message m that can output x. This is typically 2 ** length.
2. 2nd-preimage resistance. Find two messages that share the same hash. This is typically 2 ** length as well.
3. Multi-target 2nd-preimage resistance. Given a set of hashes, can be find a matching hash for any of them? This is typically 2**(n-k) where n is the length of the hash and K is the size of the set we're checking against.
4. Collision resistance. The birthday paradox. For 160 bit hashes, the effort is 2**80, for instance.
Overall, an interesting threat modeling of hash collisions. Many of the things listed above are annoying buzz words and I liked how it was explained in the article.

Confusion Attacks: Exploiting Hidden Semantic Ambiguity in Apache HTTP Server! - 1467

Orange Tsai Reference →Posted 1 Year Ago

The Apache HTTP server is constructed with modules, with 136 listed in the documentation and about half that are in normal use. To the author this, there was a bad code smell: a giant request_rec structure is passed around to each module. if there was a difference between the understanding of two modules on this, it'd be bad. This is what the research is about.
The structure contains a field called filename to represent the filesystem path. However, some of the modules treat this as a full URL, which can lead to security issues. This can be used to truncate entries using a ? in the path. For instance, mod_rewrite allows sysadmins to easily rewrite a path pattern with the RewriteRule directive. By providing a question mark here, the rewritten path will be truncated, resulting in a bad access. Another example of the truncation being useful is with a RewriteRule on the path.
The other interesting issue with the filename confusion is an ACL bypass. It's common to use the File directive to add authentication to a file access. Using the confusion on the file path with the URL encoded question mark, we can get one path verified but another actually used. For instance, admin.php%3Fooo.php would be verified by the ooo.php at the end but used with admin.php.
The next bug is crazy. When Httpd is processing a request, it first looks at that exact spot on the file system with specific rewrite rules. Then, it attempts to go to the specified document root. Most of the time, the root directory isn't there so it doesn't matter though. This means that if the prefix of a RewriteRule is controllable then the entire file system can be accessed!
Well, sorta. Because of the rewrite rule having an ending attached to it (like .html), we can only access what this allows. Additionally, Apache has a built in protection for protecting against the access of some files. Using the first primitive allows us to truncate the path though, creating a super primitive. Using this bug, the author found they could disclose arbitrary source code.
Even though there are restrictions on where can be accessed by default, we can use gadgets. The LibreOffice file at /usr/share/libreoffice/help/help.html contains an XSS. Some libraries, such as Wordpress plugins, could be used for LFI via tutorials. They mention a few other ways to exploit this, including abusing symbolic links.
In Apache, there are two directives that do the same thing: AddHandler and AddType. Under the hood, there is some magic from 1996 to allow for both to be used by using the content_type field as the module handler when the handler field is empty. This new primitive is the ability to overwrite the function handler.
The first instance of this being exploited was in mod security. When an error occurred in processing of a path, it wasn't being handled correctly by the Content-Type was being overwritten. As a result, the wrong handler was being executed, resulting in source code for PHP instead of the result of PHP being returned. This technique could be used in conjunction with other content type changes as well.
Next, if an attacker can control the Content-Type header in the response then we can invoke ANY handler. Even though this processing happens after receiving, server side redirect make this exploitable to hit any CGI implementation on the server. The author mentions an SSRF with controlled headers or CRLF injection as potential ways to do this.
How does this become exploitable? Getting an image file to be processed as a PHP script can quickly lead to RCE. mod_proxy leads to a full SSRF or direct access to unix sockets. Finally, they found that PEAR.php included with Docker can be used to get RCE by using PHP even.
At the end of the article, they say this is promising for more research. The author only focused on issues in a few impactful fields but there may be other fields that cause as much havoc. The more complex a code base is the more unique vulnerabilities are likely lurking there. Amazing research, as always by Orange Tsai :)

0.0.0.0 Day: Exploiting Localhost APIs From the Browser- 1466

Avi Lumelsky - Oligo Security Reference →Posted 1 Year Ago

Browsers can request any data via HTTP using JavaScript. From a website, it's possible to make requests to items on the local network, such as localhost. Should this be allowed? IP scanning and attacks on the LAN are very possible here.
All major browsers have CORS - but this is only for response data and not the outbound. So, Chrome released a standard called Private Network Access (PNA). This extends CORS to restrict the ability to send requests to PNA domains.
PNA has a large list of domains that fall into the private category. While doing research into this topic, they noticed that 0.0.0.0 was not in the list though. Is this bad? 0.0.0.0 has multiple uses but it commonly just means localhost.
Since 0.0.0.0 can be requested to, this violates PNA completely for localhost. Many local apps skip CSRF or authentication checks solely because of this feature.
They found that an application called Ray used by developers could be exploited for RCE. Selenium Grid had a similar issue as well as PyTorch.
How do we fix this? PNA headers will be added to requests. In order to allow the browser to make these requests, the website will need to return Access-Control-Request-Private-Network: true, similar to how CORS works. Good bug write up and a good explanation on an incoming feature!