Appendix B. Ganglia and Hadoop/HBase
You’ve got data—lots and lots of data that’s just too valuable to delete or take offline for even a minute. Your data is likely made up of a number of different formats, and you know that your data will only grow larger and more complex over time. Don’t fret. The growing pains you’re facing have been faced by other people and there are systems to handle it: Hadoop and HBase.
If you want to use Ganglia to monitor a Hadoop or HBase cluster, I have good news—Ganglia support is built in.
Introducing Hadoop and HBase
Hadoop is an Apache-licensed open source system modeled after Google’s MapReduce and Google File System (GFS) systems. Hadoop was created by Doug Cutting, who now works as an architect at Cloudera and serves as chair of the Apache Software Foundation. He named Hadoop after his son’s yellow stuffed toy elephant.
With Hadoop, you can grow the size of your filesystem by adding more machines to your cluster. This feature allows you to grow storage incrementally, regardless of whether you need terabytes or petabytes of space. Hadoop also ensures your data is safe by automatically replicating your data to multiple machines. You could remove a machine from your cluster and take it out to a grassy field with a baseball bat to reenact the printer scene from Office Space—and not lose a single byte of data.
The Hadoop MapReduce engine breaks data processing up into smaller units of work and intelligently distributes them across your cluster. The MapReduce ...