Chapter 1. Rethinking the Lakehouse
The Pain Points of Today’s Lakehouses
Imagine setting up a lakehouse attached to a cloud object store in less than a minute. No Apache Spark, no Java, no catalog wiring, no distributed architectural management. Just a couple lines of SQL and you are ready. It might sound absurdly optimistic, but it is not. Ducklake makes this a reality and here’s the proof. With just two lines of SQL, you will have a Ducklake lakehouse wired up to GCS ready to make and manage tables:
CREATE OR REPLACE SECRET gcs_creds (TYPE GCS, KEY_ID getenv('GCS_KEY'), SECRET getenv('GCS_SECRET'));
ATTACH OR REPLACE 'ducklake:gcs_wh.ducklake' AS gcs_wh (DATA_PATH getenv('GCS_WHS_PATH'));
If you have built lakehouses with other formats, then the simplicity illustrated here might be hard to believe. Why does DuckLake take a few lines to code and setup, while Apache Iceberg and Delta Lake require pages of configuration? The answer boils down to a core architectural decision - where the metadata ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access