Chapter 11. Monitoring

The term observability is often used to describe a desirable attribute of distributed systems. Observability means having visibility into the various components of a system in order to detect, predict, and perhaps even prevent the complex failures that can occur in distributed systems. Failures in individual components can affect other components in turn, and multiple failures can interact in unforeseen ways, leading to system-wide outages. Common elements of an observability strategy for a system include metrics, logging, and tracing.

In this chapter, you’ll learn how Cassandra supports these elements of observability and how to use available tools to monitor and understand important events in the life cycle of your Cassandra cluster. We’ll look at some simple ways to see what’s going on, such as changing the logging levels and understanding the output.

To begin, let’s discuss how Cassandra uses the Java Management Extensions (JMX) to expose information about its internal operations and allow the dynamic configuration of some of its behavior. That will give you a basis to learn how to monitor Cassandra with various tools.

Monitoring Cassandra with JMX

Cassandra makes use of JMX to enable remote management of your nodes. JMX started as Java Specification Request (JSR) 160 and has been a core part of Java since version 5.0. You can read more about the JMX implementation in Java by examining the java.lang.management package.

JMX is a Java API that provides ...

Get Cassandra: The Definitive Guide, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.