Chapter 10Fault Tolerance

HARDWARE FAILS. Over the years I have had basically every major hardware component on a server fail, from CPUs to RAM to SCSI controllers and, of course, hard drives. In addition to hardware failure, system downtime is often the result of some other problem such as a bad configuration on a switch, a power outage, or even a sysadmin accidentally rebooting the wrong server. If you lose money whenever a service is down, you quickly come up with methods to keep that service up no matter what component fails.

In this chapter I discuss some of the methods you can use with Ubuntu servers to make them more fault-tolerant. I start with some general fault tolerance principles. Then I talk about ways to add fault tolerance to your ...

Get The Official Ubuntu Server Book, Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.