Evolution of data processing at Google

Google has invested significant resources into developing tools and techniques to meet their own internal data processing needs. Starting with MapReduce in 2004, Google set out to tackle big data with a divide and conquer approach, spreading batch data processing workloads across many machines. MapReduce provided a foundational framework for writing complex and highly-parallel data processing pipelines by breaking down data processing tasks into map, to filter and sort inputs into logical subsets, and reduce, to summarize the subsets through aggregate operations. The open source community quickly latched onto this concept and built an entire ecosystem around it, resulting in projects like Apache Hadoop ...

Get Building Google Cloud Platform Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.