7

Processing Data Using Azure Databricks

Databricks is a data engineering product built on top of Apache Spark that provides a unified, cloud-optimized platform so that you can perform Extract, Transform, and Load (ETL), Machine Learning (ML), and Artificial Intelligence (AI) tasks on a large quantity of data.

Azure Databricks, as its name suggests, is the Databricks integration with Azure, which also provides fully managed Spark clusters, an interactive workspace for data visualization and exploration, integration with data sources such as Azure Blob Storage, Azure Data Lake Storage, Azure Cosmos DB, and Azure SQL Data Warehouse.

Azure Databricks can process data from multiple and diverse data sources, such as SQL or NoSQL, structured or unstructured ...

Get Azure Data Engineering Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.