O'Reilly logo

Learning Real-time Processing with Spark Streaming by Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Resilient distributed datasets and discretized streams

In this section we will discuss the architecture, motivation and other important concepts related to resilient distributed datasets. We will also talk about the implementation methodology adopted by Spark libraries/extensions like Spark Streaming for extending and exposing resilient distributed datasets.

Resilient distributed datasets

Resilient distributed datasets (RDD) is an independent concept which was developed in the University of California, Berkeley and was first implemented in systems like Spark to show its real usage and power. RDD is a core component of Spark. It provides in-memory representation of immutable datasets for parallel and distributed processing. RDD are more of abstraction ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required