4

Ingesting Streaming Data

Using the Spark SQL engine, Apache Spark Structured Streaming provides a stream processing engine that can handle large-scale and reliable data streams. You can write your streaming computation using the same syntax as a batch computation on static data. The Spark SQL engine will run your computation in an incremental and continuous manner and keep the final result updated as new streaming data arrives. The computation is performed on the same efficient Spark SQL engine. The system also ensures that the computation is fault-tolerant from end to end by using checkpointing and write-ahead logs.

Apache Spark Structured Streaming is favored for real-time data processing due to its high-level, unified API that seamlessly ...

Get Data Engineering with Databricks Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.