
69MapReduce Family of Large-Scale Data-Processing Systems
is then compiled and executed, along with bindings to connect to externally pro-
vided aggregators. The data sets of Sawzall programs are often stored in Google File
System (GFS) [58]. The business of scheduling a job to run on a cluster of machines
is handled by a software called Workqueue, which creates a large-scale time shar-
ing system out of an array of computers and their disks. It schedules jobs, allocates
resources, reports status, and collects the results.
Google has also developed FlumeJava [30], a Java library for developing and run-
ning data-parallel pipelines on top of MapR ...