Chapter 8. Advanced and Alternate MapReduce Techniques

This chapter discusses techniques for handling larger jobs with more complex requirements. In particular, the section on map-side joins covers the case in which the input data is already sorted, and the section on chaining discusses ways of adding additional mapper classes to a job without passing all the job data through the network multiple times.

The traditional MapReduce job involves providing a pair of Java classes to handle the map and reduce tasks: reading a set of textual input files using KeyValueTextInputFormat or SequenceFileInputFormat, and writing the sorted results set out using TextOutputFormat or SequenceFileOutputFormat. The framework will schedule the map tasks if possible ...

Get Pro Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.