5

Working with Big Data and Databricks

This chapter covers the following recipes:

  • Setting up an HDInsight cluster
  • Processing data from Azure Data Lake with HDInsight and Hive
  • Building data models in Delta Lake and data pipeline jobs with Databricks
  • Ingesting data into Delta Lake using Mapping data flows
  • External integrations with other compute engines (Snowflake)

Introduction

Azure Data Factory (ADF) stands out for its adept use of big data tools, facilitating the creation of rapid and scalable ETL/ELT pipelines while seamlessly managing petabytes of data storage. However, venturing into the realm of establishing a production-ready data engineering cluster without the support of Azure’s specialized services unveils significant challenges. ...

Get Azure Data Factory Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.