The Mahout and Spark integration

Apache Mahout was a general machine learning library built on top of Hadoop. Mahout started out primarily as a Java MapReduce package to run machine learning algorithms. As machine learning algorithms are iterative in nature, MapReduce had major performance and scalability issues, so Mahout stopped the development of MapReduce-based algorithms and started supporting new platforms, such as Spark, H2O, and Flink, with a new package called Samsara.

Let's install Mahout, explore the Mahout shell with Scala bindings, and then build a recommendation system.

Installing Mahout

The latest version of Spark does not work well with Mahout yet, so I used the Spark 1.4.1 version with the Mahout 0.12.2 version. Download the Spark ...

Get Big Data Analytics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.