
212 CHAPTER 9. MAPREDUCE
solution is to have each mapper try to coalesce its messages before sending
to the shuffler.
The coalescing is done by a combiner specified by the user. Often the
combiner will be the same as the reducer. So, what occurs is that each
mapper will run the reducer on its own mapper output, then send the
combiner output to the shuffler, after which it goes to the reducers as
usual.
Thus in our word count example here, when a line arrives at a reducer, its
count field may already have a value greater than 1. The combiner code,
by the way, is specified via the -combiner field in the run command, like
-mapper and -reducer.
9.1.5 Role of Disk