Let's now return to our transformed pipe-delimited user-community movie ratings dataset, movie-ratings-data/user-movie-ratings.csv, which contains ratings by 300 users covering 3,000 movies. We will develop an application in Apache Spark that seeks to reduce the dimensionality of this dataset while preserving its structure using PCA. To do this, we will go through the following steps:
- First, let's load the transformed, pipe-delimited user-community movie ratings dataset into a Spark ...