Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

Out-of-Cancel: A Vulnerability Class Rooted in Workqueue Cancellation APIs- 1932

v4belPosted 8 Hours Ago
  • The function cancel_work_sync() can be used to stop currently running tasks in the Linux kernel, but it can be rescheduled through a separate path. Unlike tasklets, workqueue-based execution doesn't provide a reliable way to control an object's lifetime using cancellation alone. So, disable_work_sync was added to address this. However, none of this sat well with the author of this post. This subtle design led to multiple race condition vulnerabilities in the synchronous worker cancellation process.
  • The author makes a note that this isn't a missing lock or a forgotten condition: this is a fundamental design issue. The _cancel APIs are treated as a synchronization barrier for the object's lifetime. While it can stop/clean up what is running right now, it does not guarantee it will ever run again. So, they named this bug class Out of Cancel issues and seem to expect to find more of these in the Linux kernel in the future.
  • ULP (Upper Layer Protocol) is a mechanism to hook TCP for special code before or after the TCP code. This gives it a lot of flexibility but also blurs the lines of object ownership, lifetime, and execution context. Given the complexity of this and the TCP state machine, small implementation mistakes can cause other parts of TCP to behave in weird ways. One of these ULPs is ESP transport based on RFC 8229. Although they found bugs in several locations, the main focus is a CVE within espintcp.
  • Within espintcp_close the code calls cancel+work_sync() when it should call disable_work_sync(). This makes the work schedulable again, even though the function contains cleanup code. This leads to a classic use-after-free scenario. The rest of the post is all about hitting the race condition reliability and requires a deep understanding of the Linux kernel to grasp.
  • At the end of the article, they show the patches for this bug and three others that they found. In all three cases, the patch looks the same: use disable_delayed_work_sync instead of cancel_delayed_work_sync. The article is interesting, even without all of the technical concepts on binary exploitation. They found a bad design pattern and found multiple abuses of it. That's great research!