Chapter 4: Working with Streaming Data

As data ingestion pipelines evolve and change, we see a lot of streaming sources, such as Azure Event Hubs and Apache Kafka, being used as sources or sinks as part of data pipeline applications. Streaming data such as temperature sensor data and vehicle sensor data has become common these days. We need to build our data pipeline application in such a way that we can process streaming data in real time and at scale. Azure Databricks provides a great set of APIs, including Spark Structured Streaming, to process these events in real time. We can store the streaming data in various sinks, including Databricks File System (DBFS), in various file formats, in various streaming systems, such as Event Hubs and Kafka, ...

Get Azure Databricks Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.