Chapter 3: Creating ETL Operations with Azure Databricks

In this chapter, we will learn how to set up different connections to use external sources of data such as Simple Storage Service (S3), set up our Azure Storage account, and use Azure Databricks notebooks to create extract, transform, and load (ETL) operations that clean and transform data. We will leverage Azure Data Factory (ADF), and finally, we will look at an example of designing an ETL operation that is event-driven. By exploring the sections in this chapter, you will be able to have a high-level understanding of how data can be loaded from external sources and then transformed into data pipelines, constructed and orchestrated using Azure Databricks. Let's start with a brief overview ...

Get Distributed Data Systems with Azure Databricks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.