October 2018
Beginner
220 pages
5h 33m
English
In Chapter 3, Deep Dive into the Hadoop Distributed File System we already studied how we can solve the problem of storing multiple small files that are less than the HDFS block size. In addition to the sequential file approach, you can also use the Hadoop Archives (HAR) mechanism to store multiple small files together. Hadoop archive files will always have the .har extension. Each hadoop archive holds index information and multiple parts of that file. HDFS provides the HarFileSystem class to work on HAR files. Hadoop Archive can be created with the archiving tool from the command-line interface of hadoop. To create an archive across multiple files, use the following command:
hrishikesh@base0:/$ hadoop archive -archiveName ...
Read now
Unlock full access