Google's MapReduce framework
How do you quickly sort a list of a billion elements? Or multiply two matrices, each with a million rows and a million columns?
In implementing their PageRank algorithm, Google quickly discovered the need for a systematic framework for processing massive datasets. That can be done only by distributing the data and the processing over many storage units and processors. Implementing a single algorithm, such as PageRank in that environment is difficult, and maintaining the implementation as the dataset grows is even more challenging.
The answer is to separate the software into two levels: a framework that manages the big data access and parallel processing at a lower level, and a couple of user-written methods at an upper-level. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access