Step 3 - Explore and query for related statistics

Let's check the ratings-related statistics. Just use the following code lines:

val numRatings = ratingsDF.count()val numUsers = ratingsDF.select(ratingsDF.col("userId")).distinct().count()val numMovies = ratingsDF.select(ratingsDF.col("movieId")).distinct().count() 
println("Got " + numRatings + " ratings from " + numUsers + " users on " + numMovies + " movies.") >>>Got 105339 ratings from 668 users on 10325 movies.

You should find 105,339 ratings from 668 users on 10,325 movies. Now, let's get the maximum and minimum ratings along with the count of users who have rated a movie. However, you need to perform an SQL query on the rating table we just created in memory in the previous step. Making ...

Get Scala Machine Learning Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.