O'Reilly logo

Optimizing Hadoop for MapReduce by Khaled Tannir

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hadoop MapReduce internals

The MapReduce programing model can be used to process many large-scale data problems using one or more steps. Also, it can be efficiently implemented to support problems that deal with large amount of data using a large number of machines. In a Big Data context, the size of data processed may be so large that the data cannot be stored on a single machine.

In a typical Hadoop MapReduce framework, data is divided into blocks and distributed across many nodes in a cluster and the MapReduce framework takes advantage of data locality by shipping computation to data rather than moving data to where it is processed. Most input data blocks for MapReduce applications are located on the local node, so they can be loaded very fast, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required