CHAPTER 7

image

Hadoop Input/Output

Previous chapters outlined MapReduce concepts in detail and we started delving deeper into the way Hadoop is implemented at the end of Chapter 5. This chapter expands on that theme. First, compression schemes are explained, followed by a detailed discussion on Hadoop I/O. We address various types of files, such as Sequence and Avro files. In the process, you develop a deeper understanding of how the MapReduce framework works internally in the Hadoop engine.

Compression Schemes

So far, you have learned the basic fundamentals of MapReduce. MapReduce is an I/O intensive process. Reducing or optimizing I/O is the key ...

Get Pro Apache Hadoop, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.