September 2017
Beginner to intermediate
360 pages
8h 13m
English
Resilient—that means RDDs are fault tolerant—now the question is, if they are in-memory, how can they recover from the failures unless they are persisted, which brings back the disk latency? Well let me help you all here—the answer is RDD lineage that helps in the re-computation of missing/damaged partition data in the event of any failures.
Distributed—well an RDD data resides in-memory across the different worker nodes in the cluster.
Dataset—well it's a collection ...
Read now
Unlock full access