Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode
This recipe explains how to implement a simple MapReduce program to count the number of occurrences of words in a dataset. WordCount is famous as the HelloWorld equivalent for Hadoop MapReduce.
To run a MapReduce job, users should supply a map
function, a reduce
function, input data, and a location to store the output data. When executed, Hadoop carries out the following steps:
- Hadoop uses the supplied InputFormat to break the input data into key-value pairs and invokes the
map
function for each key-value pair, providing the key-value pair as the input. When executed, themap
function can output zero or more key-value pairs. - Hadoop transmits the key-value ...
Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.