Chapter 2. Managing Complexity

Complexity is a challenge and an opportunity for engineers. You need a team of people skilled and dynamic enough to successfully run a distributed system with many parts and interactions. The opportunity to innovate and optimize within the complex system is immense.

Software engineers typically optimize for three properties: performance, availability, and fault tolerance.


In this context refers to minimization of latency or capacity costs.


Refers to the system’s ability to respond and avoid downtime.

Fault tolerance

Refers to the system’s ability to recover from any undesirable state.

An experienced team will optimize for all three of these qualities simultaneously.

At Netflix, engineers also consider a fourth property:

Velocity of feature development

Describes the speed with which engineers can provide new, innovative features to customers.

Netflix explicitly makes engineering decisions based on what encourages feature velocity throughout the system, not just in service to the swift deployment of a local feature. Finding a balance between all four of these properties informs the decision-making process when architectures are planned and chosen.

With these properties in mind, Netflix chose to adopt a microservice architecture. Let us remember Conway’s Law:

Any organization that designs a system (defined broadly) will inevitably produce a design whose structure is a copy of the organization’s communication structure. ...

Get Chaos Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.