Factors affecting the performance of MapReduce

The processing time of input data with MapReduce may be affected by many factors. One of these factors is the algorithm you use while implementing your map and reduce functions. Other external factors may also affect the MapReduce performance. Based on our experience and observation, the following are the major factors that may affect MapReduce performance:

  • Hardware (or resources) such as CPU clock, disk I/O, network bandwidth, and memory size.
  • The underlying storage system.
  • Data size for input data, shuffle data, and output data, which are closely correlated with the runtime of a job.
  • Job algorithms (or program) such as map, reduce, partition, combine, and compress. Some algorithms may be hard to conceptualize ...

Get Optimizing Hadoop for MapReduce now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.