Cluster monitoring and health checking
Monitoring is an important task that helps in maintaining cluster resources and ensures its serviceability to users. Monitoring involves resource control and health check tasks on the various nodes (compute, login, and services), networking devices, and storage systems.
This chapter introduces some tools and resources that can be employed to support monitoring of a high-performance computing (HPC) cluster.
This chapter includes the following topics:

Get POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.