Chapter 17. Monitoring and Automation
As use of Hadoop clusters in the cloud grows in your organization, it becomes more important to be able to monitor them. Early on, monitoring focuses on ensuring that the clusters are fully up and not overloaded; this information can help guide your future choices for instance types, cluster size, storage size, and network configuration. As time goes on, monitoring data will become more important for keeping tabs on overall cloud expenditure. Of course, cloud clusters themselves will also become more crucial to the organization, so it becomes doubly important to be sure they are working properly.
The need for monitoring is somewhat less when clusters are transient (see “Long-Running or Transient?”). A transient cluster does not survive for long, so if it does have problems, it can be torn down and replaced using systems already established. Long-running clusters, on the other hand, need more monitoring, as it’s necessary that they remain in good shape for continuous or on-demand use.
There are two facets to monitoring cloud clusters: monitoring the cloud provider resources themselves, and monitoring the Hadoop components running on them. As you may have noticed, the cloud provider’s consoles already deliver health information, and so it’s good to start by considering all the monitoring features they offer.
When you’re ready to start monitoring Hadoop clusters, you’ll find that you have choices for which monitoring system ...