Chapter 2. Old-View Thinking

We’ve all seen the same problems repeat themselves. Recurring small incidents, severe outages, and even data losses are stories many in IT can commiserate over in the halls of tech conferences and forums of Reddit. It’s the nature of building complex systems. Failure happens. However, in many cases we fall into the common trap of repeating the same process over and over again, expecting the results to be different. We investigate and analyze problems using techniques that have been well established as “best practices.” We always feel like we can do it better. We think we are smarter than others—or maybe our previous selves—yet the same problems seem to continue to occur, and as systems grow, the frequency of the problems grows as well. Attempts at preventing problems always seem to be an exercise in futility. Teams become used to chaotic and reactionary responses to IT problems, unaware that the way it was done in the past may no longer apply to modern systems.

We have to change the way we approach the work. Sadly, in many cases, we don’t have the authority to bring about change in the way we do our jobs. Tools and process decisions are made from the top. Directed by senior leaders, we fall victim to routine and the fact that no one has ever stopped to ask if what we are doing is actually helping.

Traditional techniques of post-incident analysis have had minimal success in providing greater availability and reliability of IT services.

In Chapter 6 ...

Get Post-Incident Reviews now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.