Chapter 8

Monitoring and prevention

Abstract

In this chapter, the reader will learn about situational awareness, active, proactive, and reactive methods for generating full system understanding (the bird’s eye view) and making the right, data-driven decisions. The reader will also learn about the importance of monitoring and processing the significant data, as well as how to avoid the pitfalls and false positives in data trends. Last, this chapter will also address monitoring and auditing facilities, and correlation between environment and system events.

Keywords

monitoring
trend
report
log
sar
audit
nagios
zabbix
Our work so far has been focused on investigating problems and following industry best practices of problem solving. In a large ...

Get Problem-solving in High Performance Computing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.