O'Reilly logo

Clojure for Data Science by Henry Garner

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine learning on Spark with MLlib

We've covered enough of the basics of Spark now to use our RDDs for machine learning. While Spark handles the infrastructure, the actual work of performing machine learning is handled by an apache Spark subproject called MLlib.

Note

An overview of all the capabilities of the MLlib library are at https://spark.apache.org/docs/latest/mllib-guide.html.

MLlib provides a wealth of machine learning algorithms for use on Spark, including those for regression, classification, and clustering covered elsewhere in this book. In this chapter, we'll be using the algorithm MLlib provides for performing collaborative filtering: alternating least squares.

Movie recommendations with alternating least squares

In Chapter 5, Big Data ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required