Cluster monitoring and health checking
Monitoring is an important task that helps in maintaining cluster resources and ensures its serviceability to users. Monitoring involves resource control and health check tasks on the various nodes (compute, login, and services), networking devices, and storage systems.
This chapter introduces some tools and resources that can be employed to support monitoring of a high-performance computing (HPC) cluster.
This chapter includes the following topics:

Get POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.