May 2017
Beginner to intermediate
596 pages
15h 2m
English
Spark streaming supports both metadata checkpointing as well as data checkpointing in order to provide the required fault tolerance for critical 24/7 applications. Metadata checkpointing includes configurations, DStream operations, and batches to recover the overall process, while data checkpointing includes persisting the in-flight RDDs to a reliable storage. Checkpointing can be enabled for operations that involve data transformations. However, for simple processing, where certain failure levels can be tolerated, it may not be required.