Skip to Content
Apache Hudi: The Definitive Guide
book

Apache Hudi: The Definitive Guide

by Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, Rebecca Bilbro
October 2025
Intermediate to advanced
290 pages
7h 43m
English
O'Reilly Media, Inc.
Book available
Content preview from Apache Hudi: The Definitive Guide

Chapter 2. Getting Started with Hudi

In Chapter 1, we explored the foundational concepts that make Apache Hudi a compelling choice for modern data architectures. We explored how data lakes have evolved into lakehouses, discussed Hudi’s position in this ecosystem, and reviewed its high-level architecture, the Hudi stack, and key feature highlights. While these concepts provide essential context, the best way to truly understand Hudi’s capabilities is through hands-on experience.

This chapter shifts from theory to practice. Rather than simply listing features, we’ll demonstrate how Hudi tables behave under different configurations and operations, allowing you to observe firsthand how the underlying table layout evolves as you perform common lakehouse operations.

We’ll start with a simple purchase tracking table and use Apache Spark to perform typical Create, Read, Update, and Delete (CRUD) operations. As we execute these commands, we’ll examine the resulting changes to the table’s physical structure, helping you develop an intuitive understanding of how Hudi organizes and manages your data behind the scenes.

The chapter is organized into three progressive sections that build upon each other. “Basic Operations” creates a Hudi table using the default Copy-on-Write (COW) table type and explores fundamental CRUD operations. As we execute SQL examples, we’ll examine how each operation affects the table layout and learn core concepts like record keys, partitioning, and the timeline internals. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

gRPC: Up and Running

gRPC: Up and Running

Kasun Indrasiri, Danesh Kuruppu
Stream Processing with Apache Flink

Stream Processing with Apache Flink

Fabian Hueske, Vasiliki Kalavri
Apache Iceberg: The Definitive Guide

Apache Iceberg: The Definitive Guide

Tomer Shiran, Jason Hughes, Alex Merced
Command-Line Rust

Command-Line Rust

Ken Youens-Clark

Publisher Resources

ISBN: 9781098173821Errata Page