July 2018
Intermediate to advanced
506 pages
16h 2m
English
Google has invested significant resources into developing tools and techniques to meet their own internal data processing needs. Starting with MapReduce in 2004, Google set out to tackle big data with a divide and conquer approach, spreading batch data processing workloads across many machines. MapReduce provided a foundational framework for writing complex and highly-parallel data processing pipelines by breaking down data processing tasks into map, to filter and sort inputs into logical subsets, and reduce, to summarize the subsets through aggregate operations. The open source community quickly latched onto this concept and built an entire ecosystem around it, resulting in projects like Apache Hadoop ...