Chapter 10Fault Tolerance

HARDWARE FAILS. Over the years I have had basically every major hardware component on a server fail, from CPUs to RAM to SCSI controllers and, of course, hard drives. In addition to hardware failure, system downtime is often the result of some other problem such as a bad configuration on a switch, a power outage, or even a sysadmin accidentally rebooting the wrong server. If you lose money whenever a service is down, you quickly come up with methods to keep that service up no matter what component fails.

In this chapter I discuss some of the methods you can use with Ubuntu servers to make them more fault-tolerant. I start with some general fault tolerance principles. Then I talk about ways to add fault tolerance to your ...

Get The Official Ubuntu Server Book, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.