Chapter 6. Using Time Travel

Having worked with databases and tables before, odds are you have had that immediate sense of panic when you forgot a WHERE clause and accidentally ran a DELETE or UPDATE statement against an entire table. We have all been there. Or you may have wondered what your data or schema looked like at a specific point in time for auditing, tracking, or analysis purposes.

Given how data is constantly changing, the following scenarios are common occurrences that, historically, have been difficult to solve or answer:

Regulatory

Auditing and regulatory compliance can require that data be stored and retrieved for many years or can require that you track certain changes to your data (e.g., GDPR).

Reproduce experiments and reports

There are often requirements for data scientists or analysts to re-create reports or machine learning experiments and model outputs given a specific set of data at a specific point in time.

Rollbacks

Accidental or bad DML operations on your data, such as INSERT, UPDATE, DELETE, and MERGE, can require fixes and rollbacks to a previous state.

Time-series analysis

Reporting needs can require you to look back or analyze data over time, for example, how many new customers were added over the course of a month.

Debugging

Troubleshooting ETL pipelines, data quality issues, or broken processes where the specific cause may only be observable in a historical state.

The ability to easily traverse through different versions of data at specific points ...

Get Delta Lake: Up and Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.