5 Working with Big Data and Databricks

This chapter covers the following recipes:

Setting up an HDInsight cluster
Processing data from Azure Data Lake with HDInsight and Hive
Building data models in Delta Lake and data pipeline jobs with Databricks
Ingesting data into Delta Lake using Mapping data flows
External integrations with other compute engines (Snowflake)

Introduction

Azure Data Factory (ADF) stands out for its adept use of big data tools, facilitating the creation of rapid and scalable ETL/ELT pipelines while seamlessly managing petabytes of data storage. However, venturing into the realm of establishing a production-ready data engineering cluster without the support of Azure’s specialized services unveils significant challenges. ...

Get Azure Data Factory Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Azure Data Factory Cookbook - Second Edition by Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, Xenia Ireton

5

Working with Big Data and Databricks

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly