13
Building Streaming Pipelines Using Spark and Scala
The final chapter of this book is another combination of all we’ve learned, but in this case, we’ll be building a streaming pipeline. You can think of streaming as continuous or “real-time” ingestion of data into your analytics system. There are many ways to accomplish this, but usually, this involves an event bus or message queuing system. We’ll be using Azure Event Hubs as our streaming ingestion source because it can be configured to appear as Apache Kafka, which Spark can easily use due to its open source connectors. As a data engineer, you need to understand how to handle data efficiently and reliably in real time. Again, we’ll leverage Spark, using its structured streaming capabilities, ...
Get Data Engineering with Scala and Spark now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.