Reducing the Impact of Service Outages with Generic Mitigations: A Philosophy of Duct-Tape-Based Outage Resolution
This webcast has been postponed.
Your service should have at least one or two generic mitigations. If it doesnât, youâre in for a bad time. If it doesâtreasure them, maintain them, and use them, lest they rot beneath your feet.
While a mitigation is any action you might take to reduce the impact of a breakageâSSHing into an instance and clearing the cache, for example, or turning off the machines to close down a vulnerabilityâa generic mitigation is useful in addressing a wide variety of outages. In this talk you will learn how to distinguish between specific and generic mitigations, and how to identify what generic mitigations your service might need. Youâll also understand why you need to build generic actions you can trust to âmake it stop!â
Jennifer Mace is a Site Reliability Engineer at Google. She draws on years of experience addressing production problems before they begin, and providing big red buttons to safely stop problems when they are detected.