Spotlight on Cloud: Reducing the Impact of Service Outages with Generic Mitigations with Jennifer Mace
A philosophy of duct-tape-based outage resolution
Topic: Web Ops & Performance
While a mitigation is any action you might take to reduce the impact of a breakage—such as SSHing into an instance and clearing the cache or switching off machines to close down a vulnerability—generic mitigations are actions that first responders can take even before the root cause is fully understood. As such, they’re useful for addressing a wide variety of outages. Every service should employ at least one or two generic mitigations to minimize outage impacts.
Join us for this edition of Spotlight on Cloud as Jennifer Mace, site reliability engineer at Google, shows you how to distinguish between specific and generic mitigations and how to identify what generic mitigations your service might need. You’ll also learn why you need to build generic actions you can trust to “make it stop!”
O’Reilly Spotlight explores emerging business and technology topics and ideas through a series of one-hour interactive events. You’ll engage in a live conversation with experts, sharing your questions and ideas while hearing their unique perspectives, insights, fears, and predictions for the future.
In every edition of Spotlight on Cloud, you’ll learn about, discuss, and debate the complex, ever-evolving world of the cloud. Best of all, you’ll discover how successful companies have adopted and embraced this massive network of shared information and how you can follow their lead to transform your organization and prepare for the Next Economy.
What you'll learn-and how you can apply it
By the end of this live show, you’ll better understand:
- How generic mitigations can address a wide variety of outages
- How to distinguish between specific and generic mitigations
- How to identify what generic mitigations your service might need
This training course is for you because...
- You're a site reliability engineer, DevOps practitioner, or technical team lead who needs to monitor services and minimize any potential impact of outages.
- Come with your questions for Jennifer Mace
- Have a pen and paper handy to capture notes, insights, and inspiration
About your instructor
Jennifer Mace is a site reliability engineer at Google. She draws on years of experience addressing production problems before they begin and providing big red buttons to safely stop problems when they’re detected.
The timeframes are only estimates and may vary according to how the class is progressing
Wednesday, September 25, 2019, at 10:00am PT / 1:00pm ET
- Introduction and presentation (45 minutes)
- Interactive discussion and Q&A (15 minutes)