MapReduce, the popular data-intensive distributed computing model is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation.
Hadoop is the most popular open source Java implementation of the Google's MapReduce programming model. It is already being used for large-scale data analysis tasks by many companies and is often used for jobs where low response time is critical.
Before going deep into MapReduce programming and Hadoop performance tuning, we will review the MapReduce model basics and learn about factors that affect Hadoop's performance.
In this chapter, we will cover the following: