Skip to Content
Delta Lake: Up and Running
book

Delta Lake: Up and Running

by Bennie Haelen, Dan Davis
October 2023
Beginner to intermediate
264 pages
6h 45m
English
O'Reilly Media, Inc.
Content preview from Delta Lake: Up and Running

Chapter 3. Basic Operations on Delta Tables

Delta tables can be created in a variety of ways. How you create your tables largely depends on your familiarity with the toolset. If you are primarily a SQL developer, you can use SQL’s CREATE TABLE to create a Delta table, while Python users may prefer the DataFrameWriter API or the fine-grained and easy to use DeltaTableBuilder API.

When creating tables you can define GENERATED columns, the values of which are automatically generated based on a user-specified function over other columns in the Delta table. While some restrictions apply, generated columns are a powerful way to enrich your Delta table schemas.

Delta tables can be read by standard ANSI SQL or using the popular PySpark DataFrameReader API. You can write to a Delta table by using the classic SQL INSERT statement, or you can append a DataFrame to the table. Finally, leveraging the SQL COPY INTO option is a great way to append large amounts of data quickly.

Partitioning a Delta table based upon your frequently used query pattern can dramatically improve your query and DML performance. The individual files that make up your Delta table will be organized in subdirectories that align to the values of your partitioning columns.

Delta Lake allows you to associate custom metadata with the commit entries in your transaction log. This can be leveraged to tag sensitive commits for auditing purposes. You can also store custom tags in your table properties, so just like you can have ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Delta Lake: The Definitive Guide

Delta Lake: The Definitive Guide

Denny Lee, Tristen Wentling, Scott Haines, Prashanth Babu
Kubernetes: Up and Running, 3rd Edition

Kubernetes: Up and Running, 3rd Edition

Brendan Burns, Joe Beda, Kelsey Hightower, Lachlan Evenson
System Design on AWS

System Design on AWS

Jayanth Kumar, Mandeep Singh

Publisher Resources

ISBN: 9781098139711Errata Page