Chapter 4

MapReduce Model

Abstract

The MapReduce model was developed by Google and Yahoo for their internal use. Google created the Hadoop distributed file system and Yahoo developed Pig Latin to handle their volume of data. These products became open source. Hadoop dominates the NoSQL market as part of the SMAQ stack, the NoSQL counterpart of the LAMP stack for websites. The process has two phases: mapping and reducing. The mapping phase gets the data in a parallelized fashion. The reduce phase filters and aggregates this data to produce a final result.

Keywords

ETL (extract transform load); Google; Hadoop; HDFS (Hadoop distributed file system); LAMP stack; MapReduce; Pig Latin; RAID storage systems; SMAQ stack; Yahoo

Introduction

This chapter discusses ...

Get Joe Celko’s Complete Guide to NoSQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.