Skip to Content
Dataproc Cookbook
book

Dataproc Cookbook

by Narasimha Sadineni, Anuyogam Venkataraman
June 2025
Beginner to intermediate
438 pages
9h 17m
English
O'Reilly Media, Inc.
Content preview from Dataproc Cookbook

Chapter 7. Connecting from Dataproc to GCP Services

Dataproc provides a powerful framework for running Hadoop and Spark jobs, allowing users to connect and interact with GCP services efficiently. In this chapter, we’ll explore various ways to connect Dataproc with popular GCP services like Cloud SQL, BigQuery, Bigtable and Pub/Sub Lite. We will also see how to configure Delta Lake tables on Dataproc and read from BigLake seamlessly.

In this chapter, you’ll get hands-on experience and insights into the following connectors:

Spark-BigQuery connector

A specialized connector for high-performance Dataproc-BigQuery transfers

Spark JDBC interface

An interface to connect Dataproc to Cloud SQL and other relational databases

Pub/Sub Lite–Spark connector

A connector to integrate Dataproc with Pub/Sub Lite’s real-time messaging

Dataproc templates

Preconfigured templates for common data tasks

Delta Lake on Dataproc

Used to create Delta writes

BigLake integration

Used to query Delta Lake tables using BigLake

Reading from GCS and Writing to a BigQuery Table

Problem

You need a Spark job running on Dataproc to read CSV data from GCS, process it, and write the results to the BigQuery table.

Solution

To achieve this, you can leverage the Spark-BigQuery connector. The Spark-BigQuery connector is preinstalled on Dataproc. No additional setup is required.

Here is the code snippet to write a DataFrame to BigQuery in append mode from Spark:

outputdf.write \
 .format("bigquery") \
 .option("writeMethod" ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

GitOps Cookbook

GitOps Cookbook

Natale Vinto, Alex Soto Bueno
Terraform Cookbook

Terraform Cookbook

Kerim Satirli, Taylor Dolezal

Publisher Resources

ISBN: 9781098157692Errata Page