Chapter 1. Observability and Chaos
“You see, but you do not observe.”
Sherlock Holmes, from “A Scandal in Bohemia” by Sir Arthur Conan Doyle
Observability and chaos engineering are two relatively new disciplines that, for good reason, the mainstream has begun to recognize. The principles of observability turn your systems into inspectable and debuggable crime scenes, and chaos engineering encourages and leverages observability as it seeks to help you pre-emptively discover and overcome system weaknesses.
In this chapter you’re going to learn how chaos engineering not only relies on observability but also, as a good citizen in your systems, needs to participate in your overall system observability picture.
The Value of Observability
Observability is a key characteristic of a successful system, particularly a production system. As systems evolve increasingly rapidly, they become more complex and more susceptible to failure. Observability is the key that helps you take on responsibility for systems where you need to be able to interrogate, inspect, and piece together what happened, when, and—most importantly—why. Observability brings the power of data to explore and fix issues and to improve products. “It’s not about logs, metrics, or traces, but about being data driven during debugging and using the feedback to iterate on and improve the product,” Cindy Sridharan writes.
Observability helps you effectively debug a running system without having to modify the system in any dramatic ...