O'Reilly logo

HBase High Performance Cookbook by Ruchir Choudhry

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Bulk utilities

The process for loading data using Bulk utilities is very similar:

  1. Extracting data from the source.
  2. Transforming the data into HFiles.
  3. Loading the files into HBase by guiding the region servers as to where to find them.

Getting ready...

The following points have to be remembered when using Bulk utilities:

  • HBase/Hadoop cluster with MapReduce/Yarn should be running. You can run jps to check it.
  • Access rights (user/group) are needed to execute the program.
  • Table schema needs to be designed to the input structure.
  • Split points need to be taken into consideration.
  • The entire stack (compaction, split, block size, max file size, flush size, version compression, mem store size, block cache, garbage collections nproc, and so on) needs to be fine-tuned ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required