Skip to Content
Practical Lakehouse Architecture
book

Practical Lakehouse Architecture

by Gaurav Ashok Thalpati
July 2024
Intermediate to advanced
286 pages
7h 53m
English
O'Reilly Media, Inc.
Content preview from Practical Lakehouse Architecture

Chapter 3. Storage: The Heart of the Lakehouse

The storage layer is the heart of any data platform. In platforms based on lakehouse architecture, it plays a significant role in efficiently persisting all types of data and improving the performance of queries. The lakehouse storage layer consists of cloud storage, file formats, and table formats. In this chapter, we will focus on understanding these concepts and the available technologies to implement the lakehouse storage layer.

I’ll explain the fundamental concepts related to lakehouse storage, the difference between row-wise and columnar stores, and how storage is closely associated with performance. We will then dive deep into the file formats used to store data for analytics use cases, the benefits of using each format, and the key features you should consider while building a data platform.

Once you understand these concepts, it will be easier to discuss this chapter’s core topic—the open table formats. We will discuss the leading table formats, their features and benefits, and specific limitations that you should keep in mind when making any design decisions.

In the last section of this chapter, I’ll discuss the key design considerations for choosing the right table format for your use case. This will help you to make better design decisions while working on your day-to-day projects.

Lakehouse Storage: Key Concepts

The storage layer is the backbone of a data ecosystem. When you implement a data platform, you need a durable, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Building Medallion Architectures

Building Medallion Architectures

Piethein Strengholt
AI Engineering

AI Engineering

Chip Huyen

Publisher Resources

ISBN: 9781098153007Errata Page