Hack #78. Avoid Catastrophic Disk Failure

Access your hard drive's built-in diagnostics using Linux utilities to predict and prevent disaster.

Nobody wants to walk in after a power failure only to realize that, in addition to everything else, because of a dead hard drive they now have to rebuild entire servers and grab backed-up data from tape. Of course, the best way to avoid this situation is to be alerted when something is amiss with your SCSI or ATA hard drive, before it finally fails. Ideally the alert would come straight from the hard drive itself, but until we're able to plug an RJ-45 directly into a hard drive we'll have to settle for the next best thing, which is the drive's built-in diagnostics. For several years now, ATA and SCSI drives have supported a standard mechanism for disk diagnostics called "Self Monitoring, Analysis, and Reporting Technology" (SMART), aimed at predicting hard drive failures. It wasn't long before Linux had utilities to poll hard drives for this vital information.

The smartmontools project (http://smartmontools.sourceforge.net) produces a SMART monitoring daemon called smartd and a command-line utility called smartctl, which can do most things on demand that the daemon does in the background periodically. With these tools, along with standard Linux filesystem utilities such as debugfs and tune2fs, there aren't many hard drive issues you can't fix.

But before you can repair anything or transform yourself into a seemingly superpowered hard-drive ...

Get Linux Server Hacks, Volume Two now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.