Chapter 6. Chaos Engineering from Beginning to End

In Chapter 5 you worked through an entire cycle, from discovery of system weaknesses to overcoming them using a preprepared automated chaos experiment. The Chaos Toolkit’s experiment definition format is designed for this sort of sharing and reuse (see Chapter 7), but in this chapter you’re going to build an experiment from first principles so that you can really experience the whole journey.

To make the challenge just a little more real, the weakness you’re going to explore and discover against the target system in this chapter is multilevel in nature.

In Chapter 1 I introduced the many different areas of attack on resiliency, namely:

  • People, practices, and processes

  • Applications

  • Platform

  • Infrastructure

The experiment you’re going to create and run in this chapter will look for weaknesses in both the platform and infrastructure areas, and even in the people area.

The Target System

As this experiment is going to examine weaknesses at the people level, you’ll need more than a simple technical description of the target system. You’ll still start, though, by enumerating the technical aspects of the system; then you’ll consider the people, processes, and practices that will also be explored for weaknesses as part of the whole sociotechnical system.

The Platform: A Three-Node Kubernetes Cluster

Technically, the target system is made up of a Kubernetes cluster that is once again running nothing more than a simple service. ...

Get Learning Chaos Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.