Managing Cloud Dataflow jobs
Once a pipeline is up and running, there are limited options for managing the pipeline's execution. Currently, developers may cancel or drain a running job. Canceling a job causes a near immediate halt of execution, making this a good option for idempotent pipelines, where the state is not lost during pipeline ingestion and re-processed elements have no side effects. For example, a pipeline that performs a lift-and-shift from a CSV file in Cloud Storage into a BigQuery table with truncate-reload can likely be canceled mid-job and executed again at a later date.
However, canceling pipelines that consume data destructively, such as those with a PubsubIO source, will likely result in lost data. For cases like this, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access