Installing the MovieLens movie rating dataset

The last thing we have to do before we start actually writing some code and analyzing data using Spark is to get some data to analyze. There's some really cool movie ratings data out there from a site called grouplens.org. They actually make their data publicly available for researchers like us, so let's go grab some. I can't actually redistribute that myself because of the licensing agreements around it, so I have to walk you through actually going to the grouplens website and downloading its MovieLens dataset onto your computer, so let's go get that out of the way right now.

If you just go to grouplens.org, you should come to this web page:

This is a collection of movie ratings data, which ...

Get Frank Kane's Taming Big Data with Apache Spark and Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.