This recipe explains how to implement a simple MapReduce program to count the number of occurrences of words in a dataset. WordCount is famous as the HelloWorld equivalent for Hadoop MapReduce.
To run a MapReduce job, users should supply a
map function, a
reduce function, input data, and a location to store the output data. When executed, Hadoop carries out the following steps:
mapfunction for each key-value pair, providing the key-value pair as the input. When executed, the
mapfunction can output zero or more key-value pairs.