Hack #88. Keep Tabs on Systems and Services

Consolidate home-grown monitoring scripts and mechanisms using Nagios.

Monitoring is a key task for administrators, whether you're in a small environment of 50–100 servers or are managing many sites globally with 5,000 servers each. At some point, trying to keep up with the growth in the number of new services and servers deployed, and reflecting changes across many disparate monitoring solutions, becomes a full-time job!

Admins often monitor not only the availability of a system (using simple tools such as ping), but also the health of the services running on the system—the network devices that connect the systems to each other; peripheral devices such as printers, copiers, uninterruptible power supplies (UPSs); and even air conditioners and other equipment. Often, these tools perform simple connections to services and use SNMP and rstatd data collection and specialized environmental monitoring devices to gain a complete view of the data centers.

While there are plenty of solutions out there for collecting and aggregating this data in some sane way, I've found Nagios to provide the perfect balance between simplicity and power. Nagios is a solution that meets the requirements of our mid-sized organization's computing environment quite well, for reasons such as these:

Dependency checking

If all your printers are on a single switch, and that switch goes down, would you rather get a page about each of 50 unavailable printers or a single page ...

Get Linux Server Hacks, Volume Two now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.