book

Learning Chaos Engineering

by Russ Miles

July 2019

Beginner

175 pages

3h 55m

English

O'Reilly Media, Inc.

Read now

Unlock full access

AudienceWhat This Book Is AboutWhat This Book Is Not AboutAbout the SamplesConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
Chaos Engineering DefinedChaos Engineering Addresses the Whole Sociotechnical SystemLocations of Dark DebtThe Process of Chaos EngineeringThe Practices of Chaos EngineeringSandbox/Staging or Production?Chaos Engineering and ObservabilityIs There a “Chaos Engineer”?Summary
Start with Experiments?Gathering HypothesesIncident AnalysisSketching Your SystemCapturing “What Could Possibly Go Wrong?”Introducing Likelihood and ImpactBuilding a Likelihood-Impact MapAdding What You Care AboutCreating Your Hypothesis BacklogSummary
What Is a Game Day?Planning Your Game DayPick a HypothesisPick a Style of Game DayDecide Who Participates and Who ObservesDecide WhereDecide When and For How LongDescribe Your Game Day ExperimentGet Approval!Running the Game DayConsider a “Safety Monitor”Summary
Installing Python 3Installing the Chaos Toolkit CLISummary
Setting Up the Sample Target SystemA Quick Tour of the Sample SystemExploring and Discovering Evidence of WeaknessesRunning Your ExperimentUnder the Skin of chaos runSteady-State Deviation Might Indicate “Opportunity for Improvement”Improving the SystemValidating the ImprovementSummary
The Target SystemThe Platform: A Three-Node Kubernetes ClusterThe Application: A Single Service, Replicated Three TimesThe People: Application Team and Cluster AdministratorsHunting for a WeaknessNaming Your ExperimentDefining Your Steady-State HypothesisInjecting Turbulent Conditions in an Experiment’s MethodUsing the Kubernetes Driver from Your MethodBeing a Good Citizen with RollbacksBringing It All Together and Running Your ExperimentOvercoming a Weakness: Applying a Disruption BudgetSummary
Sharing Experiment DefinitionsMoving Values into ConfigurationSpecifying Configuration Properties as Environment VariablesExternalizing SecretsScoping SecretsSpecifying a Contribution ModelCreating and Sharing Human-Readable Chaos Experiment ReportsCreating a Single-Experiment Execution ReportCreating and Sharing a Multiple Experiment Execution ReportSummary

Creating Your Own Custom Driver with No Custom CodeImplementing Probes and Actions with HTTP CallsImplementing Probes and Actions Through Process CallsCreating Your Own Custom Chaos Driver in PythonCreating a New Python Module for Your Chaos Toolkit Extension ProjectAdding the ProbeSummary
Experiment “Controls”Enabling ControlsEnabling a Control Inline in an ExperimentEnabling a Control GloballySummary
Adding Logging to Your Chaos ExperimentsCentralized Chaos Logging in ActionTracing Your Chaos ExperimentsIntroducing OpenTracingApplying the OpenTracing ControlSummary
Creating a New Chaos Toolkit Extension for Your ControlsAdding Your (Very) Simple Human Interaction ControlSkipping or Executing an Experiment’s ActivitySummary
What Is Continuous Chaos?Scheduling Continuous Chaos Using cronCreating a Script to Execute Your Chaos TestsAdding Your Chaos Tests Script to cronScheduling Continuous Chaos with JenkinsGrabbing a Copy of JenkinsAdding Your Chaos Tests to a Jenkins BuildScheduling Your Chaos Tests in Jenkins with Build TriggersSummary
The Default Chaos CommandsDiscovering What’s Possible with the chaos discover CommandAuthoring a New Experiment with the chaos init CommandChecking Your Experiment with the chaos validate CommandExtending the Chaos Commands with Plug-ins

Content preview from Learning Chaos Engineering

Chapter 9. Chaos and Operations

If chaos engineering were just about surfacing evidence of system weaknesses through Game Days and automated chaos experiments, then life would be less complicated. Less complicated, but also much less safe!

In the case of Game Days, much safety can be achieved by executing the Game Day against a sandbox environment and ensuring that everyone—participants, observers, and external parties—is aware the Game Day is happening.¹

The challenge is harder with automated chaos experiments. Automated experiments could potentially be executed by anyone, at any time, and possibly against any system.² There are two main categories of operational concern when it comes to your automated chaos experiments (Figure 9-1):

Control: You or other members of your team may want to seize control of a running experiment. For example you may want to shut it down immediately, or you may just want to be asked whether a particularly dangerous step in the experiment should be executed or skipped.
Observation: You want your experiment to be debuggable as it runs in production. You should be able to see what experiments are currently running, and what step they have just executed, and then trace that back to how other elements of your system are executing in parallel.

An image of the operational concerns associated with a running automated chaos experiment.

There ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781492050995Errata Page Supplemental Content

Learning Chaos Engineering

by Russ Miles

Chapter 9. Chaos and Operations

Figure 9-1. The control and observation operational concerns of a running automated chaos experiment

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Chaos Engineering

Chaos Engineering

Chaos Engineering

Chaos Engineering Observability

Publisher Resources

Chapter 9. Chaos and Operations

Figure 9-1. The control and observation operational concerns of a running automated chaos experiment

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Chaos Engineering

Chaos Engineering

Chaos Engineering

Chaos Engineering Observability

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.