Writing Cloud Dataflow results to BigQuery is a very common pattern for both stream ingestion and batch ETL processes. Dataflow provides a very powerful basis for transforming and conditioning data for storage, and BigQuery provides fast and expressive ad-hoc exploration of that data. Cloud Dataflow provides first-class support for integrating with BigQuery via the BigQueryIO reader and writer.
BigQuery as a Cloud Dataflow Sink
BigQueryIO automatically adapts how it writes to BigQuery based on whether the pipeline is processing bounded or unbounded data. For bounded datasets, BigQueryIO performs inserts using batch file uploads. For unbounded datasets, inserts are performed using streaming insert API calls. This behavior can be overridden ...
Get Building Google Cloud Platform Solutions now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.