Hadoop design is based on the Sequence file format, which is used to append key/value pairs; this stems from the HDFS append-only capability.
This design is retrofitted by a concept of MapFiles and an extension of SequenceFile.
MapFile is nothing but a bundle of two Sequences Files in a directory. The first file is
/data and the second is
/index. This allows us to append key/value pairs and every
N key; we can configure
N as needed. This setup also allows us to store the key and the offset in the index. This gives us the flexibility to do extremely fast lookups as the data and the index have less entries. Once you are aware of the block, data file location can be done at a very fast pace.
MapFile is effective as we can look up keys and ...