April 2017
Intermediate to advanced
532 pages
12h 39m
English
The MovieLens 100k dataset is a set of 100,000 data points related to ratings given by a set of users to a set of movies. It also contains movie metadata and user profiles. While it is a small dataset, you can quickly download it and run Spark code on it. This makes it ideal for illustrative purposes.
You can download the dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip.
Once you have downloaded the data, unzip it using your terminal:
>unzip ml-100k.zipinflating: ml-100k/allbut.pl inflating: ml-100k/mku.sh inflating: ml-100k/README ...inflating: ml-100k/ub.base inflating: ml-100k/ub.test
This will create a directory called ml-100k. Change into this directory and examine the contents. The ...
Read now
Unlock full access