Collaborative filtering using implicit feedback
Sometimes the feedback available is not in the form of ratings but in the form of audio tracks played, movies watched, and so on. This data, at first glance, may not look as good as explicit ratings by users, but this is much more exhaustive.
Getting ready
We are going to use million song data from http://www.kaggle.com/c/msdchallenge/data. You need to download three files:
kaggle_visible_evaluation_triplets
kaggle_users.txt
kaggle_songs.txt
Now perform the following steps:
- Create a
songdata
folder inhdfs
and put all the three files here:$ hdfs dfs -mkdir songdata
- Upload the song data to
hdfs
:$ hdfs dfs -put kaggle_visible_evaluation_triplets.txt songdata/ $ hdfs dfs -put kaggle_users.txt songdata/
Get Spark Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.