Skip to Main Content
Data Engineering with Google Cloud Platform
book

Data Engineering with Google Cloud Platform

by Adi Wijaya
March 2022
Beginner to intermediate content levelBeginner to intermediate
440 pages
9h 43m
English
Packt Publishing
Content preview from Data Engineering with Google Cloud Platform

Chapter 5: Building a Data Lake Using Dataproc

A data lake is a concept similar to a data warehouse, but the key difference is what you store in it. A data lake's role is to store as much raw data as possible without knowing first what the value or end goal of the data is. Given this key differentiation, how to store and access data in a data lake is different compared to what we learned in Chapter 3, Building a Data Warehouse in BigQuery.

This chapter helps you understand how to build a data lake using Dataproc, which is a managed Hadoop cluster in Google Cloud Platform (GCP) But, more importantly, it helps you understand the key benefit of using a data lake in the cloud, which is allowing the use of ephemeral clusters.

Here is the high-level ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Engineering with Google Cloud Platform - Second Edition

Data Engineering with Google Cloud Platform - Second Edition

Adi Wijaya
Architecting Data and Machine Learning Platforms

Architecting Data and Machine Learning Platforms

Marco Tranquillin, Valliappa Lakshmanan, Firat Tekiner

Publisher Resources

ISBN: 9781800561328Supplemental Content