Cloud Dataflow

Cloud Dataflow is a service based on Apache Beam, which is an open source software for creating data processing pipelines. A pipeline is essentially a piece of code that determines how we wish to process our data. Once these pipelines have been constructed and input into the service, they become a Dataflow job. This is where we can process our data ingested by Pub/Sub. It will perform steps to change our data from one format to another, and can transform both real-time stream or historical batch data. Dataflow is completely serverless and fully managed. It will spin up and destroy the necessary resources to execute our Dataflow job. As an example, a pipeline job might be made up of several steps. If a specific step requires ...

Get Professional Cloud Architect - Google Cloud Certification Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.