O'Reilly logo

HBase High Performance Cookbook by Ruchir Choudhry

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Read path

Hadoop design is based on the Sequence file format, which is used to append key/value pairs; this stems from the HDFS append-only capability.

This design is retrofitted by a concept of MapFiles and an extension of SequenceFile.

MapFile is nothing but a bundle of two Sequences Files in a directory. The first file is /data and the second is /index. This allows us to append key/value pairs and every N key; we can configure N as needed. This setup also allows us to store the key and the offset in the index. This gives us the flexibility to do extremely fast lookups as the data and the index have less entries. Once you are aware of the block, data file location can be done at a very fast pace.

MapFile is effective as we can look up keys and ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required