Chapter 8. Using Apache Polaris with Snowflake
In this chapter, we explore how to integrate Apache Polaris with Snowflake to query Iceberg tables from Snowflake’s platform. We’ll walk through setting up Snowflake to connect to a Polaris catalog (either a self-hosted Polaris OSS instance or Snowflake’s managed Polaris service, Open Catalog), configuring the necessary external resources, running SQL queries on Polaris-managed Iceberg tables from Snowflake, and understanding the differences between Polaris-backed tables and native Snowflake tables. By the end, you will be able to query Iceberg tables via Snowflake using Polaris as the metadata catalog. You can also appreciate the trade-offs between this approach and Snowflake’s native table storage.
You should have a running Apache Polaris service with an Iceberg catalog and at least one table available. This can be a self-hosted Polaris instance (as set up in previous chapters) or the Snowflake Open Catalog. You will also need a Snowflake account with appropriate permissions (ACCOUNTADMIN or ORGADMIN role to create integrations) and access to the cloud storage where the Iceberg data resides.
Establishing Connectivity Between Snowflake and Polaris
To allow Snowflake to query data managed by Apache Polaris, it needs to connect to Polaris’s REST Catalog API. Snowflake treats Polaris as an external Iceberg catalog, retrieving table metadata and reading data files from cloud storage. This integration is achieved via two Snowflake objects: ...