July 2017
Intermediate to advanced
796 pages
18h 55m
English
Real-time streaming applications are meant to be long running and resilient to failures of all sorts. Spark Streaming implements a checkpointing mechanism that maintains enough information to recover from failures.
There are two types of data that needs to be checkpointed:
Checkpointing can be enabled by calling checkpoint() function on the StreamingContext as follows:
def checkpoint(directory: String)
Specifies the directory where the checkpoint data will be reliably stored.
Once checkpoint directory is set, any DStream can be checkpointed into the directory based on a specified interval. Looking at the Twitter example, ...
Read now
Unlock full access