Chapter 13. ROI of Chaos Engineering

“No one tells the story of the incident that didn’t happen.”

John Allspaw

Chaos Engineering is a pragmatic discipline designed to provide value to a business. One of the most difficult aspects of running a successful Chaos Engineering practice is proving that the results have business value. This chapter enumerates the difficulties of establishing a connection to business values, describes a model for methodically pursuing return on investment (ROI) called the Kirkpatrick Model, and provides an objective example of establishing ROI taken from Netflix’s experience with ChAP, the Chaos Automation Platform.

Ephemeral Nature of Incident Reduction

Imagine that you measure the uptime of your service in some consistent way, and find that you have two nines1 of uptime. You implement a Chaos Engineering practice, and subsequently the system demonstrates three nines of uptime. How do you prove that the Chaos Engineering practice should be credited and not some contemporaneous change? That attribution issue is a hard problem.

There is another confounding obstacle: the nature of the improvement that Chaos Engineering provides is self-limiting, because the most obvious benefits are ephemeral. Instead of establishing long-lasting benefits to system safety, improvements triggered by Chaos Engineering tend to open the gates to other business pressures. If Chaos Engineering improves availability, chances are good that the business will respond by releasing ...

Get Chaos Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.