Skip to Content
Apache Hudi: The Definitive Guide
book

Apache Hudi: The Definitive Guide

by Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, Rebecca Bilbro
October 2025
Intermediate to advanced
290 pages
7h 43m
English
O'Reilly Media, Inc.
Book available
Content preview from Apache Hudi: The Definitive Guide

Chapter 8. Building a Lakehouse Using Hudi Streamer

In modern organizations, data silos create more than just fragmented data; they foster fragmented efforts. Teams across the business often find themselves independently solving the same data engineering problems, building similar ETL tools, and defining their own conventions for schemas and formats. This redundancy not only wastes valuable resources but also erects significant barriers to sharing and normalizing data. The core challenge becomes a strategic one: how can an organization move beyond this inefficiency to provide a standardized set of tools and a unified platform? How can it empower teams to collaborate on ingesting and transforming data, while sharing common datasets, catalogs, and monitoring dashboards?

The modern answer to this challenge is the data lakehouse, and Apache Hudi is a particularly strong choice for building one. If your organization is suffering from data silos and has not yet converged on a single data storage solution, Hudi offers more flexibility than the alternatives. Not only does Hudi permit different parts of an organization to maintain sovereignty over their data stacks and architectures, but it also provides a specialized ingestion tool—Hudi Streamer—that can connect to a wide array of upstream sources and streamline the construction of a data lakehouse.

In this chapter, we’ll meet Alcubierre, a fictional airline company grappling with these common data silo challenges. As we imagine ourselves ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

gRPC: Up and Running

gRPC: Up and Running

Kasun Indrasiri, Danesh Kuruppu
Stream Processing with Apache Flink

Stream Processing with Apache Flink

Fabian Hueske, Vasiliki Kalavri
Apache Iceberg: The Definitive Guide

Apache Iceberg: The Definitive Guide

Tomer Shiran, Jason Hughes, Alex Merced
Command-Line Rust

Command-Line Rust

Ken Youens-Clark

Publisher Resources

ISBN: 9781098173821Errata Page