Now, first of all, we made some changes so that it uses the 1 million ratings dataset from Grouplens instead of the 100,000 ratings dataset. If you want to grab that, go over to and click on datasets:
You'll find it in the MovieLens 1M Dataset:
This data is a little bit more current, it's from 2003. They do have a current dataset that is updated as of this month, but you're going to need a pretty large cluster to handle the 40 million plus ratings in that dataset, so let's stick with 1 million for now, just ...