O'Reilly logo

Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2 by Christina J. Hogan, Strata R. Chalup, Thomas A. Limoncelli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Design Patterns for Resiliency

Success is not final, failure is not fatal: it is the courage to continue that counts.

—Winston Churchill

Resiliency is a system’s ability to constructively deal with failures. A resilient system detects failure and routes around it. Nonresilient systems fall down when faced with a malfunction. This chapter is about software-based resiliency and documents the most common techniques used.

Resiliency is important because no one goes to a web site that is down. Hardware fails—that is a fact of life. You can buy the most reliable, expensive hardware in the world and there will be some amount of failures. In a sufficiently large system, a one in a million failure is a daily occurrence.

During the first year ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required