book

Chaos Engineering

by Casey Rosenthal, Lorin Hochstein, Aaron Blohowiak, Nora Jones, Ali Basiri

August 2017

Intermediate to advanced

71 pages

1h 22m

English

O'Reilly Media, Inc.

Read now

Unlock full access

I. Introduction
1. Why Do Chaos Engineering?
How Does Chaos Engineering Differ from Testing?It’s Not Just for NetflixPrerequisites for Chaos Engineering
2. Managing Complexity
Understanding Complex SystemsExample of Systemic ComplexityTakeaway from the Example
II. The Principles of Chaos
ExperimentationAdvanced Principles
3. Hypothesize about Steady State
Characterizing Steady StateForming Hypotheses
4. Vary Real-World Events
5. Run Experiments in Production
State and ServicesInput in ProductionOther People’s SystemsAgents Making ChangesExternal ValidityPoor Excuses for Not Practicing ChaosI’m pretty sure it will break!If it does break, we’re in big trouble!Get as Close as You Can
6. Automate Experiments to Run Continuously
Automatically Executing ExperimentsAutomatically Creating Experiments
7. Minimize Blast Radius
III. Chaos In Practice

8. Designing Experiments
1. Pick a Hypothesis2. Choose the Scope of the Experiment3. Identify the Metrics You’re Going to Watch4. Notify the Organization5. Run the Experiment6. Analyze the Results7. Increase the Scope8. Automate
9. Chaos Maturity Model
SophisticationAdoptionDraw the Map
10. Conclusion
Resources

Overview

With so many interacting components, the number of things that can go wrong in a distributed system is enormous. You’ll never be able to prevent all possible failure modes, but you can identify many of the weaknesses in your system before they’re triggered by these events. This report introduces you to Chaos Engineering, a method of experimenting on infrastructure that lets you expose weaknesses before they become a real problem.

Members of the Netflix team that developed Chaos Engineering explain how to apply these principles to your own system. By introducing controlled experiments, you’ll learn how emergent behavior from component interactions can cause your system to drift into an unsafe, chaotic state.

Hypothesize about steady state by collecting data on the health of the system
Vary real-world events by turning off a server to simulate regional failures
Run your experiments as close to the production environment as possible
Ramp up your experiment by automating it to run continuously
Minimize the effects of your experiments to keep from blowing everything up
Learn the process for designing chaos engineering experiments
Use the Chaos Maturity Model to map the state of your chaos program, including realistic goals

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491988459

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills