Chapter 1. Understanding Hadoop MapReduce

MapReduce, the popular data-intensive distributed computing model is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation.

Hadoop is the most popular open source Java implementation of the Google's MapReduce programming model. It is already being used for large-scale data analysis tasks by many companies and is often used for jobs where low response time is critical.

Before going deep into MapReduce programming and Hadoop performance tuning, we will review the MapReduce model basics and learn about factors that affect Hadoop's performance.

In this chapter, we will cover the following:

  • The MapReduce model
  • An overview ...

Get Optimizing Hadoop for MapReduce now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.