March 2019
Beginner to intermediate
778 pages
34h 20m
English
Google has invested significant resources into developing tools and techniques to meet their own internal data processing needs. Starting with MapReduce in 2004, Google set out to tackle big data with a divide and conquer approach, spreading batch data processing workloads across many machines. MapReduce provided a foundational framework for writing complex and highly-parallel data processing pipelines by breaking down data processing tasks into map, to filter and sort inputs into logical subsets, and reduce, to summarize the subsets through aggregate operations. The open source community quickly latched onto this concept and built an entire ecosystem around it, resulting in projects like Apache Hadoop ...
Read now
Unlock full access