The Mahout and Spark integration

Apache Mahout was a general machine learning library built on top of Hadoop. Mahout started out primarily as a Java MapReduce package to run machine learning algorithms. As machine learning algorithms are iterative in nature, MapReduce had major performance and scalability issues, so Mahout stopped the development of MapReduce-based algorithms and started supporting new platforms, such as Spark, H2O, and Flink, with a new package called Samsara.

Let's install Mahout, explore the Mahout shell with Scala bindings, and then build a recommendation system.

Installing Mahout

The latest version of Spark does not work well with Mahout yet, so I used the Spark 1.4.1 version with the Mahout 0.12.2 version. Download the Spark ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.