O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Distribution of number ratings

We can also look at the distribution of the number of ratings made by each user. Recall that we previously computed the rating_data RDD used in the preceding code by splitting the ratings with the tab character. We will now use the rating_data variable again in the following code.

Code resides in the class UserRatingChart. We will create a DataFrame from u.data file which is tab separated and then groupbyuser_id and sort by the count of ratings given by each user in ascending order.

object UserRatingsChart {   def main(args: Array[String]) {   } }

Let us first try to show the ratings.

val customSchema = StructType(Array(   StructField("user_id", IntegerType, true),  StructField("movie_id", IntegerType, true), ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required