Skip to Main Content
Data Engineering with Google Cloud Platform - Second Edition
book

Data Engineering with Google Cloud Platform - Second Edition

by Adi Wijaya
April 2024
Beginner to intermediate content levelBeginner to intermediate
476 pages
12h 22m
English
Packt Publishing
Content preview from Data Engineering with Google Cloud Platform - Second Edition

5

Building a Data Lake Using Dataproc

A data lake shares similarities with a data warehouse, yet its fundamental distinction lies in the nature of stored content. Unlike a data warehouse, a data lake is designed to manage extensive raw data, agnostic to its eventual value or purpose. This pivotal divergence reshapes approaches to data storage and retrieval within a data lake, setting it apart from the principles that we learned in Chapter 3, Building a Data Warehouse in BigQuery.

This chapter helps you understand how to build a data lake using Dataproc, which is a managed Hadoop cluster in Google Cloud Platform (GCP). But, more importantly, it helps you understand the key benefit of using a data lake in the cloud, which is allowing the use of ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley
Visualizing Google Cloud

Visualizing Google Cloud

Priyanka Vergadia

Publisher Resources

ISBN: 9781835080115Supplemental Content