Extracting features from the MovieLens dataset

We will use the ALS algorithm to get numerical features for users and items (movies) in this case before we can use the clustering algorithm on the data:

  1. First we load the data u.data into a DataFrame:
      val ratings = spark.sparkContext       .textFile(DATA_PATH + "/u.data")       .map(_.split("\t"))       .map(lineSplit => Rating(lineSplit(0).toInt,         lineSplit(1).toInt,  lineSplit(2).toFloat,         lineSplit(3).toLong))       .toDF()
  1. Then we split it into a 80:20 ratio to get the training and test data:
      val Array(training, test) =          ratings.randomSplit(Array(0.8, 0.2))
  1. We instantiate the ALS class, set the maximum iterations at 5, and the regularization parameter at 0.01:
      val als = new ALS()  .setMaxIter(5)  ...

