Chapter 8. AWS Glue

AWS Glue is a fully managed data integration service that provides a streamlined way to prepare and integrate data for various analytical workloads, such as business intelligence (BI) and machine learning (ML). It also offers a user-friendly visual interface that simplifies the process of job creation, execution, and management. By leveraging AWS Glue, users can use the scalable, serverless data catalog to manage their workflows. AWS Glue 3.0 and later versions support the Apache Iceberg table format. This means you can use Glue with Iceberg for a range of operations, such as creating Iceberg tables on object stores such as Amazon Simple Storage Service (Amazon S3), performing read and write operations, or just leveraging the Glue catalog for storing all your Iceberg tables.

In this chapter, you will learn how to configure AWS Glue with Apache Iceberg tables and perform various operations such as CREATE, READ, and INSERT.

As of this writing, AWS Glue 4.0 supports Iceberg v1.0.0, whereas AWS Glue 3.0 supports Iceberg v0.13.1.

Configuration

The AWS Glue integration tool works based on “jobs” that represent a single unit of work, moving data from a source (anywhere) to a destination (an Apache Iceberg table, for our purposes). We will review the configurations needed when creating a job using Apache Iceberg as a source or destination.

Creating a Glue Database

The first step is to create a database in the AWS Glue Data catalog. The Glue Data catalog acts as a ...

Get Apache Iceberg: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Apache Iceberg: The Definitive Guide by Tomer Shiran, Jason Hughes, Alex Merced

Chapter 8. AWS Glue

Configuration

Creating a Glue Database

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly