Cluster monitoring
Monitoring is an important activity to assist in the maintenance of cluster resources, and to ensure its serviceability to end users. It is composed of resource control and health check tasks on the various nodes (compute, login, services), networking devices, and storage systems.
This topic is vast and deserves an entire book itself. Instead of covering this subject in detail, this chapter introduces some tools and resources that can be employed to support monitoring of a high performance computing (HPC) cluster.
This chapter ...

Get Implementing an IBM High-Performance Computing Solution on IBM Power System S822LC now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.