After validating that the pipeline executes locally, we can easily switch to executing the pipeline on top of Google's managed infrastructure. To do this, first ensure that the Cloud Dataflow API is enabled for your project by navigating to https://console.cloud.google.com/apis/library and searching for dataflow. Select Dataflow API and click ENABLE.
In the pipeline we created, we use TextIO.read(). In addition to local files, this IO accepts URLs including paths Cloud Storage buckets and files. Before executing the pipeline in Dataflow, upload the same input file to a new Cloud Storage bucket by first creating the bucket, providing a unique name:
gsutil mb gs://<YOUR_BUCKET_NAME>
Next, upload the sample ...