book

Apache Polaris: The Definitive Guide

by Alex Merced, Andrew Madson, Tomer Shiran

September 2025

Beginner to intermediate

258 pages

5h 47m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Data Lakehouses and Apache Iceberg Fundamentals
1. Data Lakehouse and Apache Iceberg
Modern Data ChallengesThe World of Data WarehousesMoving Forward with Data LakesThe Cloud RevolutionFile-Based Analytics with Apache ParquetThe Data Lakehouse SolutionThe Key Benefits of a Data LakehouseThe Path Forward: Data Lakehouse Table FormatsThe Role of Table FormatsThe Benefits of Table FormatsExisting Table FormatsApache IcebergWhat Is Apache Iceberg?Metadata File (metadata.json)Manifest ListManifest FilesData FilesDelete FilesConclusion
2. The Role of Apache Iceberg Catalogs
What Is and Isn’t an Apache Iceberg CatalogThe Mechanics of Apache Iceberg CatalogsTypes of Apache Iceberg CatalogsFile-System CatalogsService CatalogsChallenges of Diverse Catalog OptionsClient-Side ComplexityConfiguration ChallengesAuthorization ChallengesThe Need for a Unified ApproachThe Apache Iceberg REST Catalog SpecificationKey Benefits of the REST Catalog SpecificationThe Evolution of REST Catalog ImplementationsApache PolarisThe Birth of Apache PolarisPolaris: A New Era of Lakehouse CatalogsConclusion
II. Apache Polaris
3. The Apache Polaris Security Model
What Is Polaris?CatalogsKey Features of Polaris CatalogsBenefits of Multi-Catalog ArchitecturePrincipalsWhat Are Principals?Managing PrincipalsPrincipal LifecycleCatalog RolesDefining Permissions in Catalog RolesAssigning Catalog Roles to PrincipalsBest Practices for Catalog RolesPrincipal RolesWhat Are Principal Roles?Benefits of Principal RolesBest Practices for Principal RolesPolaris Security Best PracticesMulti-Tenant EnvironmentsCross-Team CollaborationCompliance and Sensitive Data GovernanceCloud-Native DeploymentsConclusion
4. External Catalogs
NessieWhat Makes Nessie Unique?Why Use Nessie with Polaris?Example: Nessie and Polaris in ActionGravitinoWhat Makes Gravitino Unique?Why Use Gravitino with Polaris?Example: Distributed Metadata GovernanceLakekeeperWhat Makes Lakekeeper Unique?Why Use Lakekeeper with Polaris?Example: Multi-Tenant Metadata GovernanceAWS GlueWhy Use the AWS Glue Catalog?Why Use Glue with Polaris?Example: Hybrid Team CollaborationConclusion
5. Polaris REST API
Catalog OperationsList CatalogsCreate a CatalogGet Catalog DetailsUpdate a CatalogDelete a CatalogPrincipal OperationsList PrincipalsCreate a PrincipalGet Principal DetailsUpdate a PrincipalDelete a PrincipalRotate Principal CredentialsManaging RolesCreate a Catalog RoleCreate a Principal RoleList Catalog RolesList Roles Assigned to a PrincipalList All Principal RolesList Principals Assigned to a Principal RoleGet Catalog Roles Mapped to a Principal RoleGet Details of a Principal RoleAdd a Grant to a Catalog RoleRevoke a Grant from a Catalog RoleAssign a Catalog Role to a Principal RoleAssign a Role to a PrincipalUpdate a Principal RoleRevoke a Role from a PrincipalRevoke a Catalog Role from a Principal RoleDelete a Principal RoleDelete a Catalog RoleApache Iceberg REST Catalog EndpointsConfiguration APIOAuth2 APITable APIView APIConclusion
III. Hands-on with Apache Polaris

6. Working with Apache Polaris OSS
Deploying Locally with DockerPrerequisitesStep 1: Clone the RepositoryStep 2: Configure Environment VariablesStep 3: Understand the Docker Compose FileStep 4: Starting the EnvironmentStep 5: Stopping the EnvironmentCreating CatalogsWhen to Create a CatalogCreating Catalog RolesWhen to Create Catalog RolesCreating PrincipalsCreating Principal RolesWhen to Create a Principal RoleAssigning the Catalog Role to the Principal Role and Setting Permissions on the CatalogSummary
7. Using Apache Polaris with Apache Spark
Connecting Your Apache Polaris Catalog to Apache SparkUsing Spark Dataframe API with Apache Polaris (Incubating)Creating a TableQuerying a TableUpdating a TableDeleting RowsAppending DataReading Metadata TablesUsing SparkSQL with Apache PolarisCreating a TableQuerying a TableInserting DataUpdating DataDeleting DataMerging DataReading Metadata TablesTime Travel QueriesUsing Spark Streaming with Apache PolarisSetting Up Spark Streaming with PolarisStreaming Reads from PolarisStreaming Writes to PolarisHandling Deletes and OverwritesUsing Partitioned TablesMaintaining Streaming TablesConclusion
8. Using Apache Polaris with Snowflake
Establishing Connectivity Between Snowflake and PolarisConfiguring an External VolumeCreating a Polaris Catalog IntegrationQuerying Iceberg Tables via Snowflake and PolarisRegistering an Existing Polaris Table in SnowflakeQuerying the External Iceberg TableUsing Snowflake Open Catalog (Managed Polaris)Polaris-Backed Tables vs. Native Snowflake TablesConclusion
9. Using Apache Polaris with Dremio
Connecting Dremio to an Apache Polaris CatalogConnecting Polaris Using the REST Catalog ConnectorConnecting Snowflake’s Open Catalog to DremioWhy Disable Use Vended Credentials?Using Dremio SQL with Apache PolarisQuerying Iceberg Tables via PolarisQuerying the Iceberg Metadata TablesCreating Tables and CTAS in Polaris via DremioAdding Data from Files to a Table Using Copy IntoMaintaining Your Iceberg Tables with DremioDremio Automates OptimizationConclusion
10. Advanced Polaris Configuration and CLI Management
Using the Polaris CLICLI Structure, Authentication, and ProfilesManaging Entities with the CLIUnderstanding RealmsObservability: Metrics, Tracing, and LoggingMetrics with Micrometer and PrometheusTracing with OpenTelemetryLogging and Debugging with QuarkusConfiguring Polaris for ProductionSecurity and Authentication ConfigurationDurable Metadata with MetastoresHardening Defaults and Managing Feature FlagsScaling, Concurrency, and Rate LimitsFinalizing and Verifying Your Production SetupConclusion
11. Looking to the Future of Apache Polaris
Managed PolarisThe REST Catalog EcosystemData Processing EnginesStreaming and Ingestion PlatformsOther Data-Stack ToolsThe Apache Polaris RoadmapGeneric Table SupportPolicy StoreTable Maintenance FrameworkSQL and NoSQL PersistenceS3-Compatible Storage SupportCatalog UIFederated CatalogsFederated Role SupportPolaris Event ListenersUnstructured Data in PolarisConclusion
Index
About the Authors

Content preview from Apache Polaris: The Definitive Guide

Foreword

The lakehouse ecosystem has matured significantly over the last few years. Apache Iceberg emerged as the main table format, especially for analytics.

Apache Iceberg brings the reliability and simplicity of SQL queries on top of data files. To achieve this, Apache Iceberg materialized the data files as tables. This opens many new possibilities: ACID transaction, schema evolution, partitioning, and time travel. A table is essentially a set of data files and metadata. This means that we need a way to access the metadata describing a table. That’s the primary role of a catalog: to act as a reference and to provide a pointer to the metadata for a table, thus providing atomicity.

The Iceberg Catalog is now a key component, telling where the tables are located and how to access them safely. The catalog is the keystone of data governance, managing table accesses, auditing and tracking, and atomic operations on metadata.

The Apache Iceberg REST Catalog specification has dramatically changed the catalog ecosystem by providing an interoperable approach for Iceberg, where any language or tool can use the same API. But Iceberg doesn’t provide an implementation of this specification.

That’s the purpose of Apache Polaris (incubating): an Iceberg Catalog REST implementation first but with additional features like multi-catalog support and fine-grained access control at the catalog level.

Apache Polaris: The Definitive Guide is a timely, well-written book that perfectly presents Iceberg ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9798341608139Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Apache Polaris: The Definitive Guide

by Alex Merced, Andrew Madson, Tomer Shiran

Foreword

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.