Chapter 9: Batch and Streaming Data Processing with Azure Databricks
Databricks is a data engineering product built on top of Apache Spark and provides a unified, cloud optimized platform so that you can perform ETL, machine learning, and AI tasks on a large quantity of data.
Azure Databricks, as its name suggests, is the Databricks integration with Azure, which further provides fully managed Spark clusters, an interactive workspace for data visualization and exploration, Azure Data Factory, integration with data sources such as Azure Blob Storage, Azure Data Lake Storage, Azure Cosmos DB, Azure SQL Data Warehouse, and more.
Azure Databricks can process data from multiple and diverse data sources, such as SQL or NoSQL, structured or unstructured ...
Get Azure Data Engineering Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.