Skip to Content
Use Iceberg with AWS
shortcut

Use Iceberg with AWS

by Kasun Indrasiri
May 2025
Intermediate
5 pages
8m
English
O'Reilly Media, Inc.
Content preview from Use Iceberg with AWS

Use Iceberg with AWS

AWS analytics services, such as Amazon EMR, AWS Glue, Amazon Athena, and Amazon Redshift, include native support for Apache Iceberg, so you can easily build transactional data lakes on top of Amazon Simple Storage Service (Amazon S3) on AWS. All services seamlessly integrate with AWS Glue Data Catalog and use it as the Iceberg catalog.

The following figure illustrates a data pipeline architecture on AWS that utilizes Apache Iceberg for data management. The raw data is sourced from either Amazon S3 or streaming services like Amazon MSK (Kafka) and Kinesis. Ingestion tools such as Amazon EMR, AWS Glue, or Kinesis Data Analytics are used to process the data, and metadata management is handled by AWS Glue Data Catalog or Lake Formation. The processed data is then stored in Apache Iceberg tables on S3, which can be efficiently queried and analyzed by various consumer tools including Amazon Redshift, Athena, EMR, and SageMaker.

Figure 0.

Let’s highlight some of these services here:

Amazon Athena

An interactive query service that enables users to analyze data directly in Amazon S3 using standard SQL, eliminating the need for complex data loading or ETL processes.

Amazon EMR

A managed cluster platform that simplifies running big data frameworks like Apache Hadoop and Apache Spark on AWS to process ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Cloud Native DevOps with Kubernetes

Cloud Native DevOps with Kubernetes

John Arundel, Justin Domingus

Publisher Resources

ISBN: 9781098175412