CHAPTER 7

image

Monitoring with Nagios and Trend Analysis with Cacti

Monitoring is perhaps one of the most important pieces of infrastructure management. When systems go down, monitoring should alert the site reliability engineers (SREs) so they can investigate the service affected and try to bring the system back online. After that, a root cause analysis should be conducted and actions should be taken to prevent similar issues in the future. Ideally, monitoring will alert about issues before they cause a service outage.

Trend analysis is being able to view historical and current metrics on a given application or a system. Trend analysis can help in ...

Get Practical Linux Infrastructure now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.