Chapter 6. In-Cluster Monitoring Stack

The best platform is of no use if it is not fully functional. To make sure a platform is running at its best, there are mechanisms that can be utilized (i.e., software or instrumentation), to perform “self-healing.” However, there are occasions where these mechanisms fail or still need manual intervention for a variety of reasons. Additionally, it is important to collect various signals from the platform to get a good overview of it; this is called “observability.”

In this chapter, we discuss OpenShift’s built-in mechanisms and functionality for monitoring and observability. We detail each component that generates signals within the mechanisms and instruct you on how to work with them, before we dive deeper into strategies and best practices for monitoring, alerting, and observability in Chapter 7.

Cluster Monitoring Operator

The Cluster Monitoring Operator is at the heart of the monitoring platform. Chapter 9 discusses operators in depth, but we briefly touch on them here.

An operator ensures that a given state is always present on the cluster in a control loop. In the case of the Cluster Monitoring Operator, this means that all parts of the monitoring platform are present on the cluster, up-to-date, configured as specified, and running. The Cluster Monitoring Operator is managed by the Cluster Version Operator, which takes cares of all cluster operators. You can get a full list of all cluster operators by executing the following: ...

Get Operating OpenShift now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.