Chapter 3. Storage: The Heart of the Lakehouse

The storage layer is the heart of any data platform. In platforms based on lakehouse architecture, it plays a significant role in efficiently persisting all types of data and improving the performance of queries. The lakehouse storage layer consists of cloud storage, file formats, and table formats. In this chapter, we will focus on understanding these concepts and the available technologies to implement the lakehouse storage layer.

I’ll explain the fundamental concepts related to lakehouse storage, the difference between row-wise and columnar stores, and how storage is closely associated with performance. We will then dive deep into the file formats used to store data for analytics use cases, the benefits of using each format, and the key features you should consider while building a data platform.

Once you understand these concepts, it will be easier to discuss this chapter’s core topic—the open table formats. We will discuss the leading table formats, their features and benefits, and specific limitations that you should keep in mind when making any design decisions.

In the last section of this chapter, I’ll discuss the key design considerations for choosing the right table format for your use case. This will help you to make better design decisions while working on your day-to-day projects.

Lakehouse Storage: Key Concepts

The storage layer is the backbone of a data ecosystem. When you implement a data platform, you need a durable, ...

Get Practical Lakehouse Architecture now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.