shortcut

Use Iceberg with AWS

by Kasun Indrasiri

May 2025

Intermediate

5 pages

English

O'Reilly Media, Inc.

Content preview from Use Iceberg with AWS

Use Iceberg with AWS

AWS analytics services, such as Amazon EMR, AWS Glue, Amazon Athena, and Amazon Redshift, include native support for Apache Iceberg, so you can easily build transactional data lakes on top of Amazon Simple Storage Service (Amazon S3) on AWS. All services seamlessly integrate with AWS Glue Data Catalog and use it as the Iceberg catalog.

The following figure illustrates a data pipeline architecture on AWS that utilizes Apache Iceberg for data management. The raw data is sourced from either Amazon S3 or streaming services like Amazon MSK (Kafka) and Kinesis. Ingestion tools such as Amazon EMR, AWS Glue, or Kinesis Data Analytics are used to process the data, and metadata management is handled by AWS Glue Data Catalog or Lake Formation. The processed data is then stored in Apache Iceberg tables on S3, which can be efficiently queried and analyzed by various consumer tools including Amazon Redshift, Athena, EMR, and SageMaker.

Let’s highlight some of these services here:

Amazon Athena: An interactive query service that enables users to analyze data directly in Amazon S3 using standard SQL, eliminating the need for complex data loading or ETL processes.
Amazon EMR: A managed cluster platform that simplifies running big data frameworks like Apache Hadoop and Apache Spark on AWS to process ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098175412

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Use Iceberg with AWS

by Kasun Indrasiri

Use Iceberg with AWS

Figure 0.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.