Book description
Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease.
The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs.
- Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform
- Benefit from the new Delta Lake open-source storage layer for data lakehouses
- Take advantage of schema evolution, change feeds, live tables, and more
- Writefunctional PySpark code for data lakehouse ELT jobs
- Optimize Apache Spark performance through partitioning, indexing, and other tuning options
- Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake
Product information
- Title: The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake
- Author(s):
- Release date: July 2022
- Publisher(s): Apress
- ISBN: 9781484282335
You might also like
book
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform
Build efficient and scalable batch and real-time data ingestion pipelines, DevOps continuous integration and deployment pipelines, …
book
Distributed Data Systems with Azure Databricks
Quickly build and deploy massive data pipelines and improve productivity using Azure Databricks Key Features Get …
video
AWS Certified Data Analytics Specialty (2023) Hands-on
In this course, you will learn streaming massive data with AWS Kinesis; queuing messages with Simple …
book
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
Understand the complexities of modern-day data engineering platforms and explore strategies to deal with them with …