Chapter 3. Overview of Principles

In the early days of Chaos Engineering at Netflix, it was not obvious what the discipline actually was. There were some catchphrases about pulling out wires or breaking things or testing in production, many misconceptions about how to make services reliable, and very few examples of actual tools. The Chaos Team was formed to create a meaningful discipline, one that proactively improved reliability through tooling. We spent months researching Resilience Engineering and other disciplines in order to come up with a definition and a blueprint for how others could also participate in Chaos Engineering. That definition was put online as a sort of manifesto, referred to as “The Principles.” (See the Introduction: Birth of Chaos for the story of how Chaos Engineering came about.)

As is the case with any new concept, Chaos Engineering is sometimes misunderstood. The following sections explore what the discipline is, and what it is not. The gold standard for the practice is captured in the section “Advanced Principles”. Finally, we take a look at what factors could change the principles going forward.

What Chaos Engineering Is

“The Principles” defines the discipline so that we know when we are doing Chaos Engineering, how to do it, and how to do it well. The common definition today for Chaos Engineering is “The facilitation of experiments to uncover systemic weaknesses.” “The Principles” website outlines the steps of the experimentation as follows:

Get Chaos Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.