- The sample CSV data file is from movie ratings. The file can be retrieved at http://files.grouplens.org/datasets/movielens/ml-latest-small.zip.
- Once the file is extracted, we will use the ratings.csv file for our CSV program to load the data into Spark. The CSV files will look like the following:
userId |
movieId |
rating |
timestamp |
1 |
16 |
4 |
1217897793 |
1 |
24 |
1.5 |
1217895807 |
1 |
32 |
4 |
1217896246 |
1 |
47 |
4 |
1217896556 |
1 |
50 |
4 |
1217896523 |
1 |
110 |
4 |
1217896150 |
1 |
150 |
3 |
1217895940 |
1 |
161 |
4 |
1217897864 |
1 |
165 |
3 |
1217897135 |
1 |
204 |
0.5 |
1217895786 |
... |
... |
... |
... |
- Start a new project in IntelliJ or in an IDE of your choice. Make sure ...