November 2020
Beginner to intermediate
250 pages
7h 41m
English
Any SRE tasked with the care and feeding of even the most modest of services will, inevitably, have a Bad Day™. After the incident plays out, we find ourselves party to a postmortem. Operational retrospectives—a more accurate name for what we software developers and operations engineers practice, unless your outage resulted in actual death—are likely not new to you. What may be news is the interest in the concept of safety and the mechanics of how we learn from incidents in software as individuals, teams, and whole organizations.
Following are a few insights we software safety nerds have uncovered and are actively studying, attempting to help us all learn more from these impactful events:
Read now
Unlock full access