O'Reilly logo

Learning Real-time Processing with Spark Streaming by Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

High availability and fault tolerance

In this section, we will talk about the high availability and fault tolerance features of Spark and Spark Streaming in various kind of deployment models.

High availability in the standalone mode

In the standalone mode, Spark cluster, by default, is resilient to the failure of worker nodes/processes. As soon as the worker goes down, the master chooses another available worker and schedules the jobs, but what if the master itself goes down? Will it be the single point of failure? No!

Spark Standalone mode leverages ZooKeeper (http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html) for leader election and provides the flexibility to create multiple/backup masters, which automatically takes up the role of master ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required