8

The Delta Lake

Up until this chapter, we were singularly focused on enabling you to use Databricks SQL. Now that we have accomplished that, let’s investigate the technologies that enable Databricks SQL to run your data warehousing workloads on what seems to be a data lake.

In this chapter, we will focus on the primary storage format of the Databricks Lakehouse —Delta Lake. Why should you care? You should care because, unlike other cloud data warehouses, the Databricks Lakehouse stores data in open storage formats such as Delta Lake, Parquet, Optimized Row Columnar (ORC), comma-separated values (CSV), and so on, instead of proprietary formats.

We will begin by understanding the challenges posed by using other storage formats, how they affect ...

Get Business Intelligence with Databricks SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.