Resources

People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!

C stdlib isn't threadsafe and even safe Rust didn't save us- 1594

EdgeDBPosted 1 Year Ago
  • The authors of this post are porting significant amount of networking code in EdgeDB from Python to Rust. While doing this, they have ran into a lot of interesting issues, including this post. While trying the port, they noticed that it always failed on ARM64 CI runners but nothing else.
  • The CI runner appeared to hang for a while then stop. Upon logging onto the CI box, they noticed that the program had actually crashed but this was detected by the runner. They noticed a coredump, which indicated something weird had happened.
  • They loaded the coredump into GDB and noticed that it was a crash within the Rust getenv() function. The function is crashing when loading a byte from environment variables. It was attempting to load data from an invalid memory location. Why is Libc crashing!?
  • One of their co-workers dropped a line: getenv isn't threadsafe. From looking at the crash dump, it was clear that while this process was reading the environment variables, another one had write to them. In all likelihood, the memory safe for the env vars was too small so it was reallocated to be bigger. However, the other code was still reading from this.
  • The variables associated with the crash were in OPEN SSL. In their code, they were using openssl to probe for packets, which was the offending code. Since they are using a combination of Python and Rust, Rust didn't think that an unsafe operation was happening.
  • To fix the bug, they moved from rust-native-tls and used the rustls instead. By calling try_init_ssl_cert_env_vars from Python, a global lock would prevent this race condition. Looking forward, Rust is marking the environment-setter functions unsafe and glibc has tried making getenv more thread-safe.
  • Why does this only happen on ARM? The crash occurs in a call to realloc within setenv. To hit this code path, the environmental variables need to line up just write for the realloc to cause issues in getenv(). Given this information, they're pretty lucky that they found this at all.
  • Personally, a really good read. Learning about debugging techniques and interesting bugs is fun!