From a glance, a good percentage of outages are caused by bad configuration changes – the 2021 global Facebook outage, the $440mm bad configuration that brought down Knight Capital in 2012, numerous global outages at Google Cloud, Microsoft Azure, Cloudflare, and other companies with serious engineering cultures.  Why do configuration changes cause so many outages?

What helps prevent outages due to bad configuration?

But configuration change outages are anything but a solved problem.