Get full access to Mastering Hadoop 3 and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Deep dive into the Hadoop MapReduce framework

The story of Hadoop started with HDFS and MapReduce. Hadoop version 1 has the basic features for storing and processing data over a distributed platform and since then it has evolved a lot. Hadoop version 2 added major changes, such as NameNode, high availability, and a new resource management framework called YARN. However, the high-level flow for MapReduce processing did not change despite various changes in its API.

MapReduce consists of two major steps: map and reduce, and multiple minor steps that are part of the process flow from map to reduce tasks. The mappers are responsible for performing map tasks while reducers are responsible for the reduce tasks. The job of the mapper is to process ...

Get Mastering Hadoop 3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now