Skip to Content
Google Cloud Platform in Action
book

Google Cloud Platform in Action

by John J. (JJ) Geewax
September 2018
Intermediate to advanced
632 pages
21h 40m
English
Manning Publications
Content preview from Google Cloud Platform in Action

Chapter 20. Cloud Dataflow: large-scale data processing

This chapter covers

  • What do we mean by data processing?
  • What is Apache Beam?
  • What is Cloud Dataflow?
  • How can you use Apache Beam and Cloud Dataflow together to process large sets of data?

You’ve probably heard the term data processing before, likely meaning something like “taking some data and transforming it somehow.” More specifically, when we talk about data processing, we tend to mean taking a lot of data (measured in GB at least), potentially combining it with other data, and ending with either an enriched data set of similar size or a smaller data set that summarizes some aspects of the huge pile of data. For example, imagine you had all of your email history in one big pile, and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Google Cloud Platform for Developers

Google Cloud Platform for Developers

Ted Hunter, Steven Porter
Google Cloud Platform for Architects

Google Cloud Platform for Architects

Vitthal Srinivasan, Loonycorn Ravi, Judy Raj

Publisher Resources

ISBN: 9781617293528Publisher SupportOtherPublisher WebsitePurchase Link