Skip to Content
Apache Hudi: The Definitive Guide
book

Apache Hudi: The Definitive Guide

by Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, Rebecca Bilbro
October 2025
Intermediate to advanced
290 pages
7h 43m
English
O'Reilly Media, Inc.
Book available
Content preview from Apache Hudi: The Definitive Guide

Preface

Why This Book, and Why Now

Modern data platforms are being asked to do more than ever before. They must serve fresh data to dashboards, power machine learning features in real time, and support operational applications alongside traditional analytics. At the same time, volumes of data are growing rapidly, pipelines are increasingly complex, and organizations cannot afford downtime or inconsistency. The gap between what businesses expect and what legacy systems can deliver has only widened.

Apache Hudi emerged to address exactly this gap. By bringing transactions, incremental ingestion, and advanced table services to the data lake, Hudi redefined what was possible. It pioneered the data lakehouse architecture, which unifies the openness and scalability of lakes with the reliability and performance of warehouses. In recent years, Hudi has matured into one of the most widely adopted open table formats, supported by a vibrant community and deployed at scale in industries ranging from technology and finance to retail and research.

The world of data architecture is at an inflection point. Lakehouses have transitioned from a cutting-edge idea to an industry standard. Hudi has kept pace, introducing powerful features such as multiwriter concurrency control, metadata-driven optimizations, and integrated streaming ingestion. Yet with this power comes the responsibility to make the right choices—there are design trade-offs, operational considerations, and architectural choices that ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

gRPC: Up and Running

gRPC: Up and Running

Kasun Indrasiri, Danesh Kuruppu
Stream Processing with Apache Flink

Stream Processing with Apache Flink

Fabian Hueske, Vasiliki Kalavri
Apache Iceberg: The Definitive Guide

Apache Iceberg: The Definitive Guide

Tomer Shiran, Jason Hughes, Alex Merced
Command-Line Rust

Command-Line Rust

Ken Youens-Clark

Publisher Resources

ISBN: 9781098173821Errata Page