Chapter 10. Implementing Chaos Engineering Observability

In this chapter you’re going to see how you can use existing Chaos Toolkit controls from the previous chapter to make your chaos experiments observable as they execute.

Observability is an important operational concern, because it helps you effectively debug a running system without having to modify the system in any dramatic way. You can think of observability as being a superset of system management and monitoring. Management and monitoring have traditionally been great at answering closed questions such as “Is that server responding?” Observability extends this power to answering open questions such as “Can I trace the latency of a user interaction in real time?” or “How successful was a user interaction that was submitted yesterday?”

When you execute automated chaos experiments, they too need to participate in the overall system’s observability picture. In this chapter you’re going to look at the following observability concerns and how they can be enabled for your own Chaos Toolkit chaos experiments:

  • Logging the execution of your chaos experiments

  • Distributed tracing of the execution of your chaos experiments

Get Learning Chaos Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.