Chapter 6. Using Time Travel
Having worked with databases and tables before, odds are you have had that immediate sense of panic when you forgot a WHERE
clause and accidentally ran a DELETE
or UPDATE
statement against an entire table. We have all been there. Or you may have wondered what your data or schema looked like at a specific point in time for auditing, tracking, or analysis purposes.
Given how data is constantly changing, the following scenarios are common occurrences that, historically, have been difficult to solve or answer:
- Regulatory
Auditing and regulatory compliance can require that data be stored and retrieved for many years or can require that you track certain changes to your data (e.g., GDPR).
- Reproduce experiments and reports
There are often requirements for data scientists or analysts to re-create reports or machine learning experiments and model outputs given a specific set of data at a specific point in time.
- Rollbacks
Accidental or bad DML operations on your data, such as
INSERT
,UPDATE
,DELETE
, andMERGE
, can require fixes and rollbacks to a previous state.- Time-series analysis
Reporting needs can require you to look back or analyze data over time, for example, how many new customers were added over the course of a month.
- Debugging
Troubleshooting ETL pipelines, data quality issues, or broken processes where the specific cause may only be observable in a historical state.
The ability to easily traverse through different versions of data at specific points ...
Get Delta Lake: Up and Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.