Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode

This recipe explains how to implement a simple MapReduce program to count the number of occurrences of words in a dataset. WordCount is famous as the HelloWorld equivalent for Hadoop MapReduce.

To run a MapReduce job, users should supply a map function, a reduce function, input data, and a location to store the output data. When executed, Hadoop carries out the following steps:

  1. Hadoop uses the supplied InputFormat to break the input data into key-value pairs and invokes the map function for each key-value pair, providing the key-value pair as the input. When executed, the map function can output zero or more key-value pairs.
  2. Hadoop transmits the key-value ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.