In this chapter, we will develop a large specification to fully show what the process looks like, ideation, missteps and all.
Problem Overview
MapReduce was one of the first Big Data algorithms
. It helped Google scale quickly and handle huge amounts of data, providing the foundation of Hadoop and the big data revolution. Instead of doing a calculation on a single computer, you distribute it among several computers (the map) and use one to combine the data after. The typical example is counting the number of words in 1,000,000 books. It might not fit in memory, so here’s how you can MapReduce the calculation ...