Chapter 3. Basic Operations on Delta Tables

Delta tables can be created in a variety of ways. How you create your tables largely depends on your familiarity with the toolset. If you are primarily a SQL developer, you can use SQL’s CREATE TABLE to create a Delta table, while Python users may prefer the DataFrameWriter API or the fine-grained and easy to use DeltaTableBuilder API.

When creating tables you can define GENERATED columns, the values of which are automatically generated based on a user-specified function over other columns in the Delta table. While some restrictions apply, generated columns are a powerful way to enrich your Delta table schemas.

Delta tables can be read by standard ANSI SQL or using the popular PySpark DataFrameReader API. You can write to a Delta table by using the classic SQL INSERT statement, or you can append a DataFrame to the table. Finally, leveraging the SQL COPY INTO option is a great way to append large amounts of data quickly.

Partitioning a Delta table based upon your frequently used query pattern can dramatically improve your query and DML performance. The individual files that make up your Delta table will be organized in subdirectories that align to the values of your partitioning columns.

Delta Lake allows you to associate custom metadata with the commit entries in your transaction log. This can be leveraged to tag sensitive commits for auditing purposes. You can also store custom tags in your table properties, so just like you can have ...

Get Delta Lake: Up and Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.