There are many lifecycle stages for a HBase cluster, including the initial planing, installation, and, eventually, the deployment of workloads. Once a cluster is in operation, it may become necessary to change its size or add extra measures for failover scenarios, all while the cluster is in use. Data should be backed up and/or moved between distinct clusters. In this chapter, we will look how this can be done with minimal to no interruption.
This section introduces the various tasks necessary while operating a cluster, including adding and removing nodes. First is a discussion about HBase sizing, as this may affect subsequent cluster administration tasks.
Sizing HBase is one of the longer standing exercises that repeatedly causes concerns. But that is not really necessary, as it just needs a little bit of background how HBase uses the allotted Java heap. The following will recap many of the concepts and information explained throughout this book. I will point to the detailed locations where applicable.
The default split of heap usage is 40% for writes (the memstores), %40 for reads (the block cache, which used to be 20% in earlier version), and the rest is for HBase itself to operate properly (refer to “Heap Tuning” for details). What is hidden here is the part where we need to store information about all open regions and their files. This includes block index and Bloom filter data from the actual storage files, ...