Chapter 14. The Dickerson Hierarchy of Reliability (A Good Place to Start)

In Chapter 4, I discussed some of the reasons organizations turn to SRE in the first place. By far the most common reason is that they have experienced a spate of bad reliability weather—a cluster of outages, perhaps public and embarrassing. In the least worst case, the news of someone else’s reliability issues has traveled to management, and they are scared. The organization is motivated and is ready for someone to take on reliability challenges. Cue the epic trailer music.

I’ve had the pleasure of talking with a sizable number of people who were at exactly this point, right at the cusp of their organization’s entry into SRE. Even though they were just getting started, their biggest issue wasn’t finding work that contributed to the reliability of their systems. That was easy. There were so many possible things they could be working on. It wasn’t a matter of finding some low-hanging fruit1 they could start with—they couldn’t walk without kicking metaphorical fruit. Their biggest problem was figuring out where to start and in what order they should approach their bounty of possibilities for the greatest possible impact.

In this chapter, I’d like to share with you the best answer I’ve heard to this conundrum and the way I usually discuss this map for getting started. At the end, I’ll also mention a few of the paths I’ve seen people take that are tempting but highly prone to failure.

The Dickerson Hierarchy ...

Get Becoming SRE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.