Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

Ethereum Log Confusion in Polygon's Heimdall- 1345

Felix Wilhelm - Asymmetric Research    Reference →Posted 2 Years Ago
  • The Polygon proof of stake network relies on three different parts: a consensus layer called Heimdall, an execution layer called Bor (fork of Geth) and a set of smart contracts. For this vulnerability, we'll be looking into the smart contracts and Heimdall layer.
  • Heimdall is a forked version of Cosmos and Tendermint. Unlike most blockchains, the staking is implemented within the smart contracts instead of natively. To do this, all events from Ethereum are picked up and processed natively if they come from the proper location. To prevent stakers from creating their own stake Heimdall uses side handlers to verify that the event occurred.
  • Any verification of Ethereum logs is critical code to get correct. Two common issues are equality checks that don’t verify all fields and insecure parsing of Ethereum log messages. Within the function responsible for decoding all of the event information, it properly verifies the emitting address. Once it calls UnpackLog() to get further information, we have a problem though.
  • UnpackLog() is used to take an Ethereum event log into a Golang struct. However, there is a missing check for the Topic of the code. Each event type has its own selector, just like functions do. From parsing, the only restriction was that it had to have the same amount of indexed parameters. With this, we have a type confusion vulnerability.
  • The StakeUpdate is call is the most interesting one to exploit but this affected a bunch of other issues. The goal is to use a SignerChange() event to trigger an StakeUpdate() function to steal all of the funds. The validatorIds correspond to the same field and the address in SignerChange is the amount (which would be a crazy large value). The final thing that lines up is the nonce from the change with the address on the update.
  • Is it feasible to line up a nonce as an address!? There is an integer truncation that occurs on the nonce while processing. So, we need the final 8 bytes of an address to line up with a valid nonce! Is this possible?
  • If we consider a valid nonce to be 0x0-0xFFFF then we have to generate 2**51 addresses for a 50% chance to hit. According to Felix, an EC2 P5 instance with 8 Nvidia H100 GPUs could do this in a fairly reasonable amount of time for some value between 50K-100K. There's a MEV bot whose address is 06f65 but ours would be 16 times more difficult to do.
  • To exploit this, perform the following steps:
    1. Change the signing key of the validator to be the address that we brute forced.
    2. Increase the nonce of the validator to match the address by performing various operations.
    3. Perform a signer change with an address under our control. This will process the fake MsgStakeUpdate() event.
  • Once the fake message is accepted, we have a crazy amount of MATIC staked. With this, we gain a supermajority over the network. Although the smart contracts hold the actual funds (which we simply can't steal) we can still do quite a bit. So, Felix chose to attack the state-sync to actual steal funds.
  • State-Sync is a mechanism used by Polygon PoS to push events from Ethereum L1 to the network. Since it processes incoming transfers from the Polygon bridge to Plasma bridge, it's a super interesting attack surface. The state sync is a feature that allows users to participate in Consensus even if there's an RPC outage by agreeing with the super majority.
  • Felix's idea was to use his inflated voting power on the Heimdall network to claim that arbitrary events occurred within this secondary voting process. By triggering arbitrary events, side messages can be created into the L1 state to create an infinite mint on the network. There was roughly $2B at the time of reporting.
  • The author offers a couple of suggestions for limiting impact. First, adding in time locks/withdrawal delays on large amounts in order to allow for actions to be performed during attacks. Second, set transfer limits on the inflow and outflow to limit how much an attacker can steal. Finally, invariant testing on things like the staked amount, single validator holding a super majority and whatever else. Overall, amazing write up on how Polygon actually works and ways that parsing can go wrong!

Price calculation can be manipulated by intentionally reverting some of price feeds- 1344

KupiaSec    Reference →Posted 2 Years Ago
  • Within the Olympus ecosystem, they have three different price feeds that can be used. If one of them reverts, then it simply uses the other ones. So, what could possibly go wrong?
  • The key to the issue is reverting. What if we could force a price feed to fail? If that was so, then we could use a price oracle that had a good price and make it more centralized for ourselves. Selective failures can be real bad in blockchain for gaming the system.
  • Uniswap and Balancer both have reentrancy checks. So, if an attacker called Olympus while in a callback from both Uniswap and Balancer then the price feeds would fail. Bunni uses Uniswap under the hood for some things. So, if this functionality is deliberately triggered then it's possible to force a revert here as well.
  • Overall, a super interesting bug that has a VERY long chain of questions being asked. I really enjoyed the bug, since it really thinks outside the box and requires a deep understand of the protocols being interacted with. At the end, they ranked this as a medium, which is fair since Chainlink could not be manipulated this way.

Browsing for Bugs: Finding and Reporting a $3M Bug in Premia Finance- 1343

Ayaz Mammadov    Reference →Posted 2 Years Ago
  • Before even diving into the target itself, the author goes through how they themselves pick a target. Ecosystem: the more mature the thing, the more bugs it's going to have. TVL range: very large, lots of auditors, very small, fewer audits. Project type: knowing common pitfall in specific functionality can quickly allow for understanding a project. Forks: if a bug is found in a project with many forks, then many things are vulnerable. Additionally, changes to forks can be easily diffed.
  • They then talk about going for depth, breath and speed. Since they were browsing this in their free time, I'm guessing they were looking for going for speed. They tend to look for things with a TVL of 5M+ and nice bug bounties.
  • The sendFrom() function is responsible for sending staked users' tokens across chains. Additionally, a user can allocate allowance to another user for these cross-chain calls. Inversely, the function _debtFrom() checks that the user calling sendFrom() has been allocated the proper allowance from the send address. If this is true, it burns the tokens then sends them off to another chain.
  • The allowance check is where the bug sits. Instead of checking the classic mapping of owner->spender->amount as they should, it instead was checking spender->spender->amount. Since an attacker now controlled both of these, they could allow themselves to spend their own value then spend the funds of other users arbitrarily. Yikes!
  • This attack could have stolen $3M worth of funds. It's insane that such a simple bug went through the cracks through audits. Good find none-the-less.

Jumpserver Preauth RCE Exploit Chain- 1342

Zhiniang Peng    Reference →Posted 2 Years Ago
  • JumpServer is a privileged access management (PAM) system that is open source. Typically, a jump server is a server that can be connected to from the outside world in order to talk to internal and sensitive things in an internal network. So, being able to compromise this software would be an awesome target for attackers.
  • When looking at something for the first time, the authentication system is great to look at. Any mistakes here lead to a compromise of the whole system! Password reset flows are an extremely common item to attack under this, since it's giving users access to their account without their original password. An implementation flaw in this is effectively an authentication bypass.
  • While scrutinizing the password reset flow (which was the standard generate random number, find number in email, click link, reset password), they noticed that it was using the random library from Python. Since this is known to NOT be cryptographically secure, this is a red flag. However, breaking this remotely is feasible in some scenarios but not all. So, now what?
  • Algorithmic random number generators need a starting value, otherwise known as a seed. So, if you know the seed then you can predict what numbers are going to be used going forward, regardless of the randomness of the algorithm. In a crazy turn of events, the author found a way to leak the seed of the randomness.
  • The software uses a captcha in order to prevent brute force attacks. After generating the image for the captcha, it sets a hex id for the image as the seed. Since the id is sent back to the user, we now know the current seed of the randomness. So, how do we actually exploit this? This feels super racey.
  • The author decided to spam these with a bunch of different threads. First, calls the endpoint with the hex captcha key as the parameter to reset the random seed. Now, with a particular seed being set, try triggering the reset code. Finally, submit several codes based upon where we think the randomness would be at. Since the server uses Gunicorn, which employs the pre-fork worker model, all of the processes needed to be poisoned for this to work. In practice, the seed plus 980 bytes needed to be used because of other randomness being used.
  • Overall, this is a crazy vulnerability. I've never seen a way to leak the seed by requesting other information. The post also contains a post-auth RCE but the password reset was my favorite part.

Writing Upgradeable Contracts- 1341

Open Zeppelin    Reference →Posted 2 Years Ago
  • The blockchain is immutable. However, we don't necessarily want our code to be immutable, since we like to be able to fix it up. So, this is done with proxies - a contract that sits in front of our implementation, which is accessed with a delegateCall() to share storage.
  • There is a problem though: the constructor only runs at the deployment of the contract. So, what if we want to have similar logic then we are going to need to write this ourselves with the same constraints as a constructor. In Open Zeppelin, this is called the Initializer pattern.
  • Open Zeppelin implements the initializer() modifier for us. This will allow the initialize function to be called once and only once during the lifespan of the proxy.
  • Although the initializer() is implemented with the functionality, this will not automatically call the children constructors like a real constructor will do. So, we must do that manually. To prevent these from being called by ours, we can use the onlyInitializing() modifier. There is another function called reinitializer() that can be used to allow for initializations after the initial one.
  • There are a few no-nos within these because of Solidity internal restrictions. First, it's a bad idea to set static values within field declarations. Why? This is equivalent to setting it within the constructor so it doesn't work. To solve this issue, use a storage slot instead. Immutables and constants are fine to upgrade since these are actually stored in the bytecode used but great care should be taken while doing this.
  • Second, the implementation contract needs to disable initialization patterns. This is because we don't want an attacker to initialize the contract, have a delegateCall() occur to trigger a self-destruct.
  • Storage changes must be done very carefully as well. Changing slots, types and whatever else is absolutely terrifying, since the code semantic meaning can change.
  • Overall, an interesting pattern that is necessary for proxies that should be scrutinized carefully.

Hunting down the HVCI bug in UEFI- 1340

Satoshi's notes    Reference →Posted 2 Years Ago
  • Hypervisor-Protected Code Integrity (HVCI) is a method of preventing compromise of various kernel parts even when an attacker has compromised part of the kernel itself. While creating a Windbg extension for HyperV on Intel processors, the author noticed something very strange. Even with HVCI enabled some of the pages were marked RWX, which shouldn't be possible. On the 7 intel devices they had, 3 of them had this issue. Exploitation of the issue was trivial, since the pages were already marked RWX. So, what happened?
  • Intel Virtualization Extension (Intel VT-x) is a virtualization standard that's been upgraded multiple times. For IOMMU device memory protection, the memory can be exposed to the hypervisor in a few different ways, such VMEXIT, VmBus with HyperV and accelerated hardware mapped directly to the VM. To isolate memory between VMs, an IOMMU should have I/O device assignment, DMA remapping and handle interrupts in the appropriate VM.
  • When a remapping occurs, it must be announced in the BIOS by an APCI table called DMA Remapping Reporting (DMAR). If a device needs to perform DMA transfer on specific memory regions, like a network controller, it'll need access both before and after setting up the IOMMU. For these types of devices, there is another special table called the Reserved Memory Region Reporting (RMRR).
  • For RMRR structure, there are two important concepts. First, the BIOS should report memory described in the RMRR as Reserved in the UEFI memory map. Second, when the OS enables remapping, the address translation for these they need to be marked as RW. So, what's the bug in all of this?
  • The first point above (RMRR marked as Reserved in the MM) was simply not happening. When the secure kernel starts, it will go through each one of the marked pages to mark it as either RW or RX but not both. Since this page is never added, we have a problem. As a result, it leaves the execution protection for the memory region on, leading to a violation of HVCI.
  • To fix the issue, the commercial devices added RMRR to the memory map. Additionally, Windows removed the X permission on all RMRR memory regions in order to prevent this type of issue from happening in the future. Overall, an interesting root cause of a bug discovered by accident.

Improving the state of Cosmos fuzzing- 1339

Gustavo Grieco - Trail of Bits    Reference →Posted 2 Years Ago
  • The Cosmos SDK is a blockchain development framework written in Golang. The security of this system is crucial. So, they have fuzzing integrated into the framework, which the author is going to talks about. The framework has two types of fuzz tests: low level and high level.
  • The low level fuzzing uses a combination of AFL, go-fuzz and native Go fuzzing to test out small portions of code. These are awesome since they have code instrumentation to attempt to hit higher code coverage. For instance, the author shows a test for the function ParseCoinNormalized, which is part of the Coin implementation. Fuzzers can quickly find issues in stateless code like this but it becomes harder to find weird issues in the combined and stateful ecosystem.
  • For the high level, the Cosmos SDK has a Blockchain Simulator to test everything else. This tool uses random operation transactions from some genesis state. This chooses random data to see if crashes or weird states occurs.
  • Now, the low level uses smart fuzzing while the high level testing uses dumb fuzzing. So, the author decided to make the high level code also support smart fuzzing! To do this on every module, they had to hijack a lower level call to Rand. They found a few bugs, which is awesome. To me, you always hear I modified their fuzzer to do XYZ because different fuzzers find different bugs.
  • Overall, I didn't know about the Cosmos SDK fuzzing framework. I may use this for future Cosmos testing on custom modules. We'll see how effective this fuzzing ends up being. Part of the problem vs C program fuzzing is that a crash doesn't mean we have a cool bug. Many of the bugs in the Cosmos SDK that are security focused would violate invariants that aren't going to be found by this type of fuzzing.

Roll with Move: Secure, instant randomness on Aptos- 1338

Aptos Labs    Reference →Posted 2 Years Ago
  • Aptos Roll is a secure instant randomness API. This is done with a bunch of pretty crazy cryptography schemes. Unlike Chainlink VRF, this is on-chain, which makes it faster and cheaper to use. This seems to be similar to the Ethereum randomness function but has appears to have better randomness properties.
  • Aptos decouples the consensus from execution. This is helpful because a shared secret can be generated then acted upon later in the execution stage. The approach allows a shared secret to be generated by using a weighted distributed key generation (wDKG). The shared secret can only be recovered by 50% or more of the validators, making it impossible to know the state ahead of time.
  • A seed for randomness is generated using a weighted verifiable random function (wVRF) using a shared secret. To me, we're using a secret sharing scheme to create a secret, disclosing this secret, using this secret as the seed for randomness then using the function to generate random numbers in a deterministic way. Pretty neat!
  • The blog post goes into the details of the Aptos network actually doing the sharing. Personally, I found it hard to follow because of the many acronyms and cryptography things I don't understand. Regardless, it's super cool and I wanted to make sure to at least have this in my notes.

ASLRn’t: How memory alignment broke library ASLR- 1337

Justin Miller    Reference →Posted 2 Years Ago
  • Address Space Layout Randomization (ASLR) is a security protection that randomizes the addresses of a process. By doing this, it requires exploits to have an information leak or get really lucky guessing. ASLR was one of the original memory corruption protections that was added to programs back in the way. In the post, the author discusses an issue with ASLR on Linux and how this incident occurred.
  • While the author was hacking on a CTF challenge, they stumbled across ASLR not working. From talking to a friend, they noticed that it only happened on libraries that were 2MBs in size. On 32 bits, it didn't work at all. On 64 bit, much of the bits weren't randomized. But why? It must be huge pages if it's 2MB!
  • Virtual addresses mappings are typically made with 4KB pages. However, in cases where we want better cache hits, Huge Pages can be used with 2MB pages. Instead of 12 bit aligned on 4KB huge pages are 21 bit aligned.
  • Years ago, some file systems moved to using thp_get_unmapped_area() for backing memory. This function recently had a changed to make allocations of larger than 2MB use huge pages instead. Boom, that's the issue!
  • The missing bits comes from this; we need a larger alignment than we did with 4KB pages. By having a larger page size much of the randomness was lost. In order to fix this, Ubuntu increased the amount of random bits on an address for 64 bit and 32 bits, giving a lot of randomness back. Overall, a look into accidentally discovering an ASLR issue on Linux.

Memcached Command Injections at Pylibmc- 1336

d4d    Reference →Posted 2 Years Ago
  • Flask is a very popular Python based web framework. The author was poking around their tech stack and noticed a library called Flask_Session, which was used for server-side session application management. This can be Redis or Memcached as the backend for it.
  • In previous talk at Black hat 2014 by Ivan Novikov they used a Memcache injection to get RCE via bad data deseralization. Even more recently, a SSRF in vBulletin got RCE by arbitrarily serialized data injection into Memcached.
  • Memcached is a newline based protocol when communicating with it. So, being able to add unescaped CRLF to the keys or values would allow for adding in extra commands. The two main commands are set and get.
  • When calling save_session() to store the information in memcached the set() call doesn't escape CSRF. As a result, a controlled session can be used to get control over arbitrary commands.
  • Since \r\n can't be used in an HTTP call, we have to escape them in the header. According to the HTTP spec (which I didn't know) this can be done by octal encoding them with a slash. For instance, \015\012 works for this.
  • I think they add in a pickle payload, which will then be processed by memcache, in order to get RCE. Overall, a super interesting bug class that I hadn't considered!