Failure is not falling down but refusing to get back up.
Disasters and major outages happen. Everyone in the company from the top down needs to recognize that fact and adopt a mindset that accepts outages and learns from them. An operations organization needs to be able to handle outages well and avoid repeating past mistakes.
Previously we’ve examined technology related to being resilient to failures and outages as well as organizational strategies like oncall. In this chapter we discuss disaster preparedness at the individual, team, procedural, and organizational levels. People must be trained so that they know the procedure well enough that they can execute it with confidence. Teams need ...