book

Apache Polaris: The Definitive Guide

by Alex Merced, Andrew Madson, Tomer Shiran

September 2025

Beginner to intermediate

258 pages

5h 47m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Data Lakehouses and Apache Iceberg Fundamentals
1. Data Lakehouse and Apache Iceberg
Modern Data ChallengesThe World of Data WarehousesMoving Forward with Data LakesThe Cloud RevolutionFile-Based Analytics with Apache ParquetThe Data Lakehouse SolutionThe Key Benefits of a Data LakehouseThe Path Forward: Data Lakehouse Table FormatsThe Role of Table FormatsThe Benefits of Table FormatsExisting Table FormatsApache IcebergWhat Is Apache Iceberg?Metadata File (metadata.json)Manifest ListManifest FilesData FilesDelete FilesConclusion
2. The Role of Apache Iceberg Catalogs
What Is and Isn’t an Apache Iceberg CatalogThe Mechanics of Apache Iceberg CatalogsTypes of Apache Iceberg CatalogsFile-System CatalogsService CatalogsChallenges of Diverse Catalog OptionsClient-Side ComplexityConfiguration ChallengesAuthorization ChallengesThe Need for a Unified ApproachThe Apache Iceberg REST Catalog SpecificationKey Benefits of the REST Catalog SpecificationThe Evolution of REST Catalog ImplementationsApache PolarisThe Birth of Apache PolarisPolaris: A New Era of Lakehouse CatalogsConclusion
II. Apache Polaris
3. The Apache Polaris Security Model
What Is Polaris?CatalogsKey Features of Polaris CatalogsBenefits of Multi-Catalog ArchitecturePrincipalsWhat Are Principals?Managing PrincipalsPrincipal LifecycleCatalog RolesDefining Permissions in Catalog RolesAssigning Catalog Roles to PrincipalsBest Practices for Catalog RolesPrincipal RolesWhat Are Principal Roles?Benefits of Principal RolesBest Practices for Principal RolesPolaris Security Best PracticesMulti-Tenant EnvironmentsCross-Team CollaborationCompliance and Sensitive Data GovernanceCloud-Native DeploymentsConclusion
4. External Catalogs
NessieWhat Makes Nessie Unique?Why Use Nessie with Polaris?Example: Nessie and Polaris in ActionGravitinoWhat Makes Gravitino Unique?Why Use Gravitino with Polaris?Example: Distributed Metadata GovernanceLakekeeperWhat Makes Lakekeeper Unique?Why Use Lakekeeper with Polaris?Example: Multi-Tenant Metadata GovernanceAWS GlueWhy Use the AWS Glue Catalog?Why Use Glue with Polaris?Example: Hybrid Team CollaborationConclusion
5. Polaris REST API
Catalog OperationsList CatalogsCreate a CatalogGet Catalog DetailsUpdate a CatalogDelete a CatalogPrincipal OperationsList PrincipalsCreate a PrincipalGet Principal DetailsUpdate a PrincipalDelete a PrincipalRotate Principal CredentialsManaging RolesCreate a Catalog RoleCreate a Principal RoleList Catalog RolesList Roles Assigned to a PrincipalList All Principal RolesList Principals Assigned to a Principal RoleGet Catalog Roles Mapped to a Principal RoleGet Details of a Principal RoleAdd a Grant to a Catalog RoleRevoke a Grant from a Catalog RoleAssign a Catalog Role to a Principal RoleAssign a Role to a PrincipalUpdate a Principal RoleRevoke a Role from a PrincipalRevoke a Catalog Role from a Principal RoleDelete a Principal RoleDelete a Catalog RoleApache Iceberg REST Catalog EndpointsConfiguration APIOAuth2 APITable APIView APIConclusion
III. Hands-on with Apache Polaris

6. Working with Apache Polaris OSS
Deploying Locally with DockerPrerequisitesStep 1: Clone the RepositoryStep 2: Configure Environment VariablesStep 3: Understand the Docker Compose FileStep 4: Starting the EnvironmentStep 5: Stopping the EnvironmentCreating CatalogsWhen to Create a CatalogCreating Catalog RolesWhen to Create Catalog RolesCreating PrincipalsCreating Principal RolesWhen to Create a Principal RoleAssigning the Catalog Role to the Principal Role and Setting Permissions on the CatalogSummary
7. Using Apache Polaris with Apache Spark
Connecting Your Apache Polaris Catalog to Apache SparkUsing Spark Dataframe API with Apache Polaris (Incubating)Creating a TableQuerying a TableUpdating a TableDeleting RowsAppending DataReading Metadata TablesUsing SparkSQL with Apache PolarisCreating a TableQuerying a TableInserting DataUpdating DataDeleting DataMerging DataReading Metadata TablesTime Travel QueriesUsing Spark Streaming with Apache PolarisSetting Up Spark Streaming with PolarisStreaming Reads from PolarisStreaming Writes to PolarisHandling Deletes and OverwritesUsing Partitioned TablesMaintaining Streaming TablesConclusion
8. Using Apache Polaris with Snowflake
Establishing Connectivity Between Snowflake and PolarisConfiguring an External VolumeCreating a Polaris Catalog IntegrationQuerying Iceberg Tables via Snowflake and PolarisRegistering an Existing Polaris Table in SnowflakeQuerying the External Iceberg TableUsing Snowflake Open Catalog (Managed Polaris)Polaris-Backed Tables vs. Native Snowflake TablesConclusion
9. Using Apache Polaris with Dremio
Connecting Dremio to an Apache Polaris CatalogConnecting Polaris Using the REST Catalog ConnectorConnecting Snowflake’s Open Catalog to DremioWhy Disable Use Vended Credentials?Using Dremio SQL with Apache PolarisQuerying Iceberg Tables via PolarisQuerying the Iceberg Metadata TablesCreating Tables and CTAS in Polaris via DremioAdding Data from Files to a Table Using Copy IntoMaintaining Your Iceberg Tables with DremioDremio Automates OptimizationConclusion
10. Advanced Polaris Configuration and CLI Management
Using the Polaris CLICLI Structure, Authentication, and ProfilesManaging Entities with the CLIUnderstanding RealmsObservability: Metrics, Tracing, and LoggingMetrics with Micrometer and PrometheusTracing with OpenTelemetryLogging and Debugging with QuarkusConfiguring Polaris for ProductionSecurity and Authentication ConfigurationDurable Metadata with MetastoresHardening Defaults and Managing Feature FlagsScaling, Concurrency, and Rate LimitsFinalizing and Verifying Your Production SetupConclusion
11. Looking to the Future of Apache Polaris
Managed PolarisThe REST Catalog EcosystemData Processing EnginesStreaming and Ingestion PlatformsOther Data-Stack ToolsThe Apache Polaris RoadmapGeneric Table SupportPolicy StoreTable Maintenance FrameworkSQL and NoSQL PersistenceS3-Compatible Storage SupportCatalog UIFederated CatalogsFederated Role SupportPolaris Event ListenersUnstructured Data in PolarisConclusion
Index
About the Authors

Content preview from Apache Polaris: The Definitive Guide

Chapter 2. The Role of Apache Iceberg Catalogs

As we’ve seen in the previous chapter, Apache Iceberg brings powerful table management capabilities to data lakehouses, enabling reliable, scalable data operations with features like ACID transactions, schema evolution, and time travel. But to fully unlock the potential of Iceberg tables, we need a way to manage and organize them across the vast and diverse ecosystem of lakehouse tools. This is where Apache Iceberg catalogs come in, providing the final piece of the lakehouse puzzle.

Iceberg catalogs act as a centralized layer that tracks, organizes, and governs the growing number of tables in a lakehouse environment. They make tables discoverable by different tools and frameworks, ensuring that data engineers, analysts, and other users can easily access the latest state of any table, regardless of where the data resides. Without catalogs, managing large-scale datasets across different query engines and environments would become chaotic and error prone, resulting in a lack of a unified view of table metadata, versions, and schema changes.

More than just a tracking system, Iceberg catalogs provide a governance layer that enforces access controls and auditability across your lakehouse. Iceberg catalogs can ensure that the right users have the appropriate access to the correct data, all while providing the transparency needed for regulatory compliance and operational security. In this chapter, we will explore how Iceberg catalogs enable ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9798341608139Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Apache Polaris: The Definitive Guide

by Alex Merced, Andrew Madson, Tomer Shiran

Chapter 2. The Role of Apache Iceberg Catalogs

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.