Amazon Web Services in Action

Chapter 13. Designing for fault-tolerance

This chapter covers

What fault-tolerance is and why you need it
Using redundancy to remove single point of failures
Retrying on failure
Using idempotent operations to achieve retry on failure
AWS service guarantees

Failure is inevitable for hard disks, networks, power, and so on. Fault-tolerance deals with that problem. A fault-tolerant system is built for failure. If a failure occurs, the system isn’t interrupted, and it continues to handle requests. If your system has a single point of failure, it’s not fault-tolerant. You can achieve fault-tolerance by introducing redundancy into your system and by decoupling the parts of your system in such a way that one side doesn’t rely on the uptime of ...

Get Amazon Web Services in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Amazon Web Services in Action by Andreas Wittig, Michael Wittig

Chapter 13. Designing for fault-tolerance

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly