Chapter 6. Design Patterns for Resiliency
Success is not final, failure is not fatal: it is the courage to continue that counts.
—Winston Churchill
Resiliency is a system’s ability to constructively deal with failures. A resilient system detects failure and routes around it. Nonresilient systems fall down when faced with a malfunction. This chapter is about software-based resiliency and documents the most common techniques used.
Resiliency is important because no one goes to a web site that is down. Hardware fails—that is a fact of life. You can buy the most reliable, expensive hardware in the world and there will be some amount of failures. In a sufficiently large system, a one in a million failure is a daily occurrence.
During the first year ...
Get Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.