O'Reilly logo

Optimizing Hadoop for MapReduce by Khaled Tannir

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

An overview of Hadoop MapReduce

Hadoop is the most popular open source Java implementation of the MapReduce programming model proposed by Google. There are many other implementations of MapReduce (such as Sphere, Starfish, Riak , and so on), which implement all the features described in the Google documentation or only a subset of these features.

Hadoop consists of distributed data storage engine and MapReduce execution engine. It has been successfully used for processing highly distributable problems across a large amount of datasets using a large number of nodes. These nodes collectively form a Hadoop cluster , which in turn consists of a single master node called JobTracker, and multiple worker (or slave) nodes; each worker node is called a ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required