Chapter 20. Cloud Dataflow: large-scale data processing

This chapter covers

  • What do we mean by data processing?
  • What is Apache Beam?
  • What is Cloud Dataflow?
  • How can you use Apache Beam and Cloud Dataflow together to process large sets of data?

You’ve probably heard the term data processing before, likely meaning something like “taking some data and transforming it somehow.” More specifically, when we talk about data processing, we tend to mean taking a lot of data (measured in GB at least), potentially combining it with other data, and ending with either an enriched data set of similar size or a smaller data set that summarizes some aspects of the huge pile of data. For example, imagine you had all of your email history in one big pile, and ...

Get Google Cloud Platform in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.