AMQ Lab at Berkley Evaluated Spark, and RDDs were evaluated through a series of experiments on Amazon EC2 as well as benchmarks of user applications.
- Algorithms used: Logistical Regression and k-means
- Use case: First iteration, multiple iterations.
All the tests used m1.xlarge EC2 nodes with 4 cores and 15 GB of RAM. HDFS was for storage with 256 MB blocks. Refer to the following graph:
The preceding graph shows the comparison between the performance of Hadoop and Spark for the first and subsequent iteration for Logistical Regression:
The preceding graph shows the comparison between ...