O'Reilly logo

Apache Spark 2.x for Java Developers by Sumit Kumar, Sourav Gulati

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Advanced streaming sources

For receiver-based sources, fault tolerance of a streaming job depends on two factors, which are failure scenario and type of receiver. A receiver receives the data and replicates them on two Spark executors to achieve fault tolerance and hence if one of the executors fails then the tasks and receivers are executed on another node by Spark without any data loss. However, if the driver fails then by design all the executors fail and the computed data and other meta information gets lost. To achieve fault tolerance on a driver node checkpointing of DStreams should be done, which saves the DAG of DStreams to fault-tolerant storage of the checkpoint. When the failed driver gets restarted it looks for meta information ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required