Skip to Content
The Cloud Data Lake
book

The Cloud Data Lake

by Rukmani Gopalan
December 2022
Beginner to intermediate
244 pages
7h
English
O'Reilly Media, Inc.
Content preview from The Cloud Data Lake

Chapter 4. Scalable Data Lakes

If you change the way you look at things, the things you look at change.

Wayne Dyer

After reading the first three chapters, you should have all you need to get your data lake architecture up and running on the cloud, at a reasonable cost profile for your organization. Theoretically, you also have the first set of use cases and scenarios successfully running in production. Your data lake is so successful that the demand for more scenarios is now higher, and you are busy serving the needs of your new customers. Your business is booming, and your data estate is growing rapidly. As they say in business, going from zero to one is a different challenge than going from one to one hundred or from one hundred to one thousand. To ensure your design is also scalable and continues to perform as your data and the use cases grow, it’s important to realize the various factors that affect the scale and performance of your data lake. Contrary to popular opinion, scale and performance are not always a trade-off with costs, but they very much go hand in hand. In this chapter, we will take a closer look at these considerations as well as strategies to optimize your data lake for scale while continuing to optimize for costs. Once again, we will be using Klodars Corporation, a fictitious organization, to illustrate our strategies. We will build on these fundamentals to focus on performance in Chapter 5.

A Sneak Peek into Scalability

Scale and performance are terms ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

The Enterprise Big Data Lake

The Enterprise Big Data Lake

Alex Gorelik
Data Lake for Enterprises

Data Lake for Enterprises

Vivek Mishra, Tomcy John, Pankaj Misra

Publisher Resources

ISBN: 9781098116576Errata Page