Skip to Content
Google Cloud Platform for Developers
book

Google Cloud Platform for Developers

by Ted Hunter, Steven Porter
July 2018
Intermediate to advanced
506 pages
16h 2m
English
Packt Publishing
Content preview from Google Cloud Platform for Developers

Collections

Dataflow pipelines operate on data in terms of collections, through the use of the abstract PCollection. Each PCollection represents a distributed set of homogeneous data as it flows through the pipeline. PCollections may represent a bounded data source, such as a specific CSV file in Cloud Storage, or an unbounded data source, such as a Cloud Pub/Sub topic.

PCollection is immutable, meaning elements cannot be added or removed from the collection once it is created. It does not support random access, such as looking up an element by ID. Also, elements within PCollection must be serializable, as they undergo binary serialization between transforms. These design constraints force developers to treat each element individually, optimizing ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Google Cloud Platform in Action

Google Cloud Platform in Action

John J. (JJ) Geewax
Google Cloud Platform for Architects

Google Cloud Platform for Architects

Vitthal Srinivasan, Loonycorn Ravi, Judy Raj

Publisher Resources

ISBN: 9781788837675Supplemental Content