Analysis and machine learning models are only as good as the data they're built on. Querying processed data and getting insights from it requires a robust data pipeline--and an effective storage solution that ensures data quality, data integrity, and performance.
This guide introduces you to Delta Lake, an open-source format that enables building a lakehouse architecture on top of existing storage systems such as S3, ADLS, GCS, and HDFS. Delta Lake enhances Apache Spark and makes it easy to store and manage massive amounts of complex data by supporting data integrity, data quality, and performance. Data engineers, data scientists, and data practitioners will learn how to build reliable data lakes and data pipelines at scale using Delta Lake.
- Understand key data reliability challenges and how to tackle them
- Learn how to use Delta Lake to realize data reliability improvements
- Concurrently run streaming and batch jobs against your data lake
- Execute update, delete, and merge commands against your data lake
- Use time travel to roll back and examine previous versions of your data
- Learn best practices to build effective, high-quality end-to-end data pipelines for real world use cases
- Integrate with other data technologies like Presto, Athena, Redshift and other BI tools
Learn how thousands of companies are processing exabytes of data per month with their lakehouse architecture using Delta Lake.
Table of contents
1. Basic Operations on Delta Lakes
- What is Delta Lake?
- How to start using Delta Lake
- Basic operations
- Unpacking the Transaction Log
- Table Utilities
- 2. Time Travel with Delta Lake
- 3. Continuous Applications with Delta Lake
- Title: Delta Lake: The Definitive Guide
- Release date: April 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098104528
You might also like
Programming Rust, 2nd Edition
The Rust programming language offers the rare and valuable combination of statically verified memory safety and …
Spark: The Definitive Guide
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the …
Go is rapidly becoming the preferred language for building web services. There are plenty of tutorials …
High Performance Python, 2nd Edition
Your Python code may run correctly, but you need it to run faster. Updated for Python …