Get full access to Apache Spark 2.x for Java Developers and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Benefits of RDD

Following are some benefits that Spark RDD model provides over Hadoop MapReduce Model:

Iterative processing: One of the biggest issue, with MapReduce processing is the IO (Input/Output) involved. It really slows down the process of MapReduce if you are running iterative operations where you would basically chain MapReduce jobs to perform multiple aggregations.

Consider running a MapReduce job that reads data from HDFS and performs some aggregation and writes the output back to HDFS. Now, mapper jobs will read data from HDFS and write the output to the local filesystem after completion and Reduce pulls that data and runs the reduce process on it. After which, it writes the output to HDFS (not considering the spill mechanism ...

Get Apache Spark 2.x for Java Developers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now