September 2015
Beginner to intermediate
608 pages
13h 43m
English
We've covered enough of the basics of Spark now to use our RDDs for machine learning. While Spark handles the infrastructure, the actual work of performing machine learning is handled by an apache Spark subproject called MLlib.
An overview of all the capabilities of the MLlib library are at https://spark.apache.org/docs/latest/mllib-guide.html.
MLlib provides a wealth of machine learning algorithms for use on Spark, including those for regression, classification, and clustering covered elsewhere in this book. In this chapter, we'll be using the algorithm MLlib provides for performing collaborative filtering: alternating least squares.
In Chapter 5, Big Data ...
Read now
Unlock full access