O'Reilly logo

Apache Spark 2.x for Java Developers by Sumit Kumar, Sourav Gulati

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Benefits of RDD

Following are some benefits that Spark RDD model provides over Hadoop MapReduce Model:

  • Iterative processing: One of the biggest issue, with MapReduce processing is the IO (Input/Output) involved. It really slows down the process of MapReduce if you are running iterative operations where you would basically chain MapReduce jobs to perform multiple aggregations.

Consider running a MapReduce job that reads data from HDFS and performs some aggregation and writes the output back to HDFS. Now, mapper jobs will read data from HDFS and write the output to the local filesystem after completion and Reduce pulls that data and runs the reduce process on it. After which, it writes the output to HDFS (not considering the spill mechanism ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required