Reproducing the AWS Outage Race Condition with a Model Checker
Oct 30, 2025AWS published a post-mortem about a recent outage [1]. Big systems like theirs are complex, and when you operate at that scale, things sometimes go wrong. Still, AWS has an impressive record of reliability.The post-mortem mentioned a race condition, which caught my eye. I don’t know all the details of AWS’s internal setup, but using the information in the post-mortem and a few assumptions, we can try to reproduce a simplified version of the problem.As a small experiment, we’ll use a ...
Read more at wyounas.github.io