Chapter 3. Scalability

Daniel Pocock

Bernard Li

Who Should Be Concerned About Scalability?

Scalability is discussed early in this book because it needs to be factored in at the planning stage rather than later on when stability problems are observed in production.

Scalability is not just about purchasing enough disk capacity to store all the RRD files. Particular effort is needed to calculate the input/output operations per second (IOPS) demands of the running gmetad server. A few hours spent on these calculations early on can avoid many hours of frustration later.

The largest Ganglia installation observed by any of the authors is a tier-1 investment bank with more than 50,000 nodes. This chapter is a must-read for enterprises of that size: without it, the default Ganglia installation will appear to be completely broken and may even flood the network with metric data, interfering with normal business operations. If it’s set up correctly (with a custom configuration), the authors can confirm that Ganglia performs exceptionally well in such an environment.

In fact, the number of nodes is not the only factor that affects scalability. In a default installation, Ganglia collects about 30 metrics from each node. However, third-party metric modules can be used to collect more than 1,000 metrics per node, dramatically increasing the workload on the Ganglia architecture. Administrators of moderate-sized networks pursuing such aggressive monitoring strategies need to consider scalability in just ...

Get Monitoring with Ganglia now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.