Training a model on the MovieLens 100k dataset

We're now ready to train our model! The other inputs required for our model are as follows:

  • rank: This refers to the number of factors in our ALS model, that is, the number of hidden features in our low-rank approximation matrices. Generally, the greater the number of factors, the better, but this has a direct impact on memory usage, both for computation and to store models for serving, particularly for large numbers of users or items. Hence, this is often a trade-off in real-world use cases. It also impacts the amount of training data required.
  • A rank in the range of 10 to 200 is usually reasonable.
  • iterations: This refers to the number of iterations to run. While each iteration in ALS is guaranteed ...

Get Machine Learning with Spark - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.