Step 3 - Explore and query for related statistics

Let's check the ratings-related statistics. Just use the following code lines:

val numRatings = ratingsDF.count()val numUsers = ratingsDF.select(ratingsDF.col("userId")).distinct().count()val numMovies = ratingsDF.select(ratingsDF.col("movieId")).distinct().count() 
println("Got " + numRatings + " ratings from " + numUsers + " users on " + numMovies + " movies.") >>>Got 105339 ratings from 668 users on 10325 movies.

You should find 105,339 ratings from 668 users on 10,325 movies. Now, let's get the maximum and minimum ratings along with the count of users who have rated a movie. However, you need to perform an SQL query on the rating table we just created in memory in the previous step. Making ...

Get Scala Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.