Chapter 6. Troubleshooting Ganglia

Bernard Li

Daniel Pocock

Overview

Sooner or later, you may encounter a problem with the Ganglia infrastructure. Because it is a distributed architecture, it is not always obvious which component is at fault. Sometimes, the fault may be completely outside the scope of the Ganglia system, such as a DNS issue, a faulty network card, or even a poorly configured web browser that results in a user mistakenly asserting that the Ganglia reports are broken.

This chapter aims to provide a systematic way of categorizing the faults, investigating them, identifying which component is responsible, rectifying the issue, and, if necessary, communicating details of the issue to the Ganglia community for discussion on the mailing list or registration in the bug database.

Known Bugs and Other Limitations

There are a number of known bugs and other limitations in the Ganglia system. For example, Ganglia is dependent on the system clock, and meaningful data will not be collected and reported if the cluster machines, data collectors, and web server machines do not have clock synchronization. This is a limitation of the Ganglia design, but it is not considered a bug.

Another known issue, fixed only just before the publication of this book (in Ganglia 3.3.7 and beyond), is that Ganglia was not working on a Solaris zone or container environment (this issue can also be worked around by disabling the network module).

To save time troubleshooting, you may wish to peruse the list of ...

Get Monitoring with Ganglia now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.