Stacking smashing protections, also known as stack canaries, is a memory corruption protection put on the stack. This is done by putting a special variable on the stack called the canary, that is random per process. If this value is different within the stack frame than the saved value when calling ret, then the program crashes.
Compilers can implement this via added code; no extra architecture-level things are required. Stack canaries protect against contiguous buffer overflows on the stack very well. To defeat this, an indexed write primitive or a stack canary leak must be found.
In AArch64, the protection does not detect/defend against overflows of dynamically sized variables like variable-length arrays (VLAs) using alloca() or user controlled increases of the stack frame in other ways.
Why does this only happen in AArch64? GCC backend lays out the stack frame differently than on other architectures. Instead of saving the return address (LR register) at the top of the stack frame (highest address), it saves it near the bottom of the frame. This allows for local variables to not be a problem when overwriting vars. Hurray!
This feels like a feature. If there is no LR to modify then who needs a canary? In practice, there's always another stack frame below with one. So, with a larger overflow, we can modify the LR register and other items on the stack.
The picture for the stack frame in the GCC source code doesn't have a stack canary. Why is this? The stack canary is treated as a local variable that it adds manually to the frame at compile time. In the compiler, there is an assumption that locals will occupy a single space but this assumption doesn't hold true for AArch64.
Dynamic allocations live at the very bottom of the stack frame, below the saved registers. This means that there is no intervening guard in this stack frame setup.
The ending contains a super simple PoC that causes a crash but doesn't catch the overflow. Overall, a really strange yet impactful bug in the GCC compiler. An assumption was made about how all stack frames were generated for the canary, which broke this codes effectiveness. Neat!