Chapter 1: Introduction to Data Processing with Apache Beam

Data. Big data. Real-time data. Data streams. Many buzzwords to describe many things, and yet they have many common properties. Mind-blowing applications can be developed from the successful application of (theoretically) simple logic – take data and produce knowledge. However, a simple-sounding task can turn out to be difficult when the amount of data needed to produce knowledge is huge (and still growing). Given the vast volumes of data produced by humanity every day, which tools should we choose to turn our simple logic into scalable solutions? That is, solutions that protect our investment in creating the data extraction logic, even in the presence of new requirements arising or ...

Get Building Big Data Pipelines with Apache Beam now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.