August 2012
Intermediate to advanced
332 pages
7h 3m
English
When we create a table in HBase, the table starts with a single region. All data inserted into that table goes to the single region. As data keeps growing, when the size of the region reaches a threshold, Region Splitting happens. The single region is split into two halves so that the table can handle more data.
In a write-heavy HBase cluster, this approach has several issues that need to be fixed:
As data grows uniformly, most of the regions are split at the same time, which causes huge disk I/O and network traffics.
Especially right after the table is created, all requests go to the same region server where ...