Foreword
The lakehouse ecosystem has matured significantly over the last few years. Apache Iceberg emerged as the main table format, especially for analytics.
Apache Iceberg brings the reliability and simplicity of SQL queries on top of data files. To achieve this, Apache Iceberg materialized the data files as tables. This opens many new possibilities: ACID transaction, schema evolution, partitioning, and time travel. A table is essentially a set of data files and metadata. This means that we need a way to access the metadata describing a table. That’s the primary role of a catalog: to act as a reference and to provide a pointer to the metadata for a table, thus providing atomicity.
The Iceberg Catalog is now a key component, telling where the tables are located and how to access them safely. The catalog is the keystone of data governance, managing table accesses, auditing and tracking, and atomic operations on metadata.
The Apache Iceberg REST Catalog specification has dramatically changed the catalog ecosystem by providing an interoperable approach for Iceberg, where any language or tool can use the same API. But Iceberg doesn’t provide an implementation of this specification.
That’s the purpose of Apache Polaris (incubating): an Iceberg Catalog REST implementation first but with additional features like multi-catalog support and fine-grained access control at the catalog level.
Apache Polaris: The Definitive Guide is a timely, well-written book that perfectly presents Iceberg ...