Executing pipelines on Cloud Dataflow

After validating that the pipeline executes locally, we can easily switch to executing the pipeline on top of Google's managed infrastructure. To do this, first ensure that the Cloud Dataflow API is enabled for your project by navigating to https://console.cloud.google.com/apis/library and searching for dataflow. Select Dataflow API and click ENABLE.

In the pipeline we created, we use TextIO.read(). In addition to local files, this IO accepts URLs including paths Cloud Storage buckets and files. Before executing the pipeline in Dataflow, upload the same input file to a new Cloud Storage bucket by first creating the bucket, providing a unique name:

gsutil mb gs://<YOUR_BUCKET_NAME>

Next, upload the sample ...

Get Building Google Cloud Platform Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.