Precreating regions using your own algorithm
When we create a table in HBase, the table starts with a single region. All data inserted into that table goes to the single region. As data keeps growing, when the size of the region reaches a threshold, Region Splitting happens. The single region is split into two halves so that the table can handle more data.
In a write-heavy HBase cluster, this approach has several issues that need to be fixed:
- The split/compaction storm issue.
As data grows uniformly, most of the regions are split at the same time, which causes huge disk I/O and network traffics.
- Load is not well balanced until enough regions have been split.
Especially right after the table is created, all requests go to the same region server where ...