Skip to Content
Cloud Analytics with Google Cloud Platform
book

Cloud Analytics with Google Cloud Platform

by Sanket Thodge
April 2018
Beginner to intermediate content levelBeginner to intermediate
282 pages
9h 53m
English
Packt Publishing
Content preview from Cloud Analytics with Google Cloud Platform

Pipelines

Pipelines in Cloud Dataflow represent a data processing job, encapsulating entire series of computations. A pipeline supports input data from multiple external sources, is capable of transforming the data, and writes output data. Output data is typically written to an external data sink, which can be one of many GCP Data Storage services.

Dataflow can easily convert data from one format to another. A pipeline is built by writing a program using the Dataflow SDK.

Pipelines consists of two parts:

  • Data: Specialized collection classes called PCollection
  • Transforms: A step in your pipeline or a data processing operation
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Science on the Google Cloud Platform

Data Science on the Google Cloud Platform

Valliappa Lakshmanan
Hands-On Machine Learning on Google Cloud Platform

Hands-On Machine Learning on Google Cloud Platform

Giuseppe Ciaburro, V Kishore Ayyadevara, Alexis Perrier, Bryan Fry, Antonio Gulli
Google Cloud Cookbook

Google Cloud Cookbook

Rui Santos Costa, Drew Hodun

Publisher Resources

ISBN: 9781788839686Other