July 2017
Intermediate to advanced
796 pages
18h 55m
English
Data checkpointing saves the actual RDDs to HDFS so that, if there is a failure of the Streaming application, the application can recover the checkpointed RDDs and continue from where it left off. While streaming application recovery is a good use case for the data checkpointing, checkpointing also helps in achieving better performance whenever some RDDs are lost because of cache cleanup or loss of an executor by instantiating the generated RDDs without a need to wait for all the parent RDDs in the lineage (DAG) to be recomputed.
Checkpointing must be enabled for applications with any of the following requirements:
Read now
Unlock full access