Chapter 3. Basic Operations on Delta Tables
Delta tables can be created in a variety of ways. How you create your tables largely depends on your familiarity with the toolset. If you are primarily a SQL developer, you can use SQL’s CREATE TABLE
to create a Delta table, while Python users may prefer the DataFrameWriter
API or the fine-grained and easy to use DeltaTableBuilder
API.
When creating tables you can define GENERATED
columns, the values of which are automatically generated based on a user-specified function over other columns in the Delta table. While some restrictions apply, generated columns are a powerful way to enrich your Delta table schemas.
Delta tables can be read by standard ANSI SQL or using the popular PySpark DataFrameReader
API. You can write to a Delta table by using the classic SQL INSERT
statement, or you can append a DataFrame to the table. Finally, leveraging the SQL COPY INTO
option is a great way to append large amounts of data quickly.
Partitioning a Delta table based upon your frequently used query pattern can dramatically improve your query and DML performance. The individual files that make up your Delta table will be organized in subdirectories that align to the values of your partitioning columns.
Delta Lake allows you to associate custom metadata with the commit entries in your transaction log. This can be leveraged to tag sensitive commits for auditing purposes. You can also store custom tags in your table properties, so just like you can have ...
Get Delta Lake: Up and Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.