5
Working with Big Data and Databricks
This chapter covers the following recipes:
- Setting up an HDInsight cluster
- Processing data from Azure Data Lake with HDInsight and Hive
- Building data models in Delta Lake and data pipeline jobs with Databricks
- Ingesting data into Delta Lake using Mapping data flows
- External integrations with other compute engines (Snowflake)
Introduction
Azure Data Factory (ADF) stands out for its adept use of big data tools, facilitating the creation of rapid and scalable ETL/ELT pipelines while seamlessly managing petabytes of data storage. However, venturing into the realm of establishing a production-ready data engineering cluster without the support of Azure’s specialized services unveils significant challenges. ...
Get Azure Data Factory Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.