Chapter 14. Real-World Use Cases of Apache Iceberg

In this chapter, we will dive into some of the real-world applications of Apache Iceberg and provide you with hands-on experience in running different analytical use cases supported by a lakehouse architecture. These use cases will include ensuring data quality in data lakes, building business intelligence (BI) reports, and implementing critical processes such as CDC. Additional use case for building a real-time analytical architecture, running machine learning (ML) workloads, and slowly changing dimensions (SCDs) are available at this supplemental repository. This chapter is a practical introductory guide, showcasing how to tackle essential real-world applications using Iceberg and highlighting its adaptability and importance as a core element in any data architecture.

Ensuring High-Quality Data with Write-Audit-Publish in Apache Iceberg

Maintaining the highest level of data quality is crucial for deriving meaningful insights. If data quality is compromised at any point in a data engineering workflow, it can adversely affect subsequent analyses such as BI and predictive analytics. For example, consider an extract, transform, and load (ETL) process: it takes data from an operational system and transfers it to an analytical system for use in BI reports or ad hoc analyses. If the original data has duplicates or inconsistencies or if such issues are introduced during the ETL process and are not addressed before reaching the production ...

Get Apache Iceberg: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.