The script has a number of features:
- We are using several packages.
- It has the familiar context preamble to our other Spark scripts seen before.
- We start dictionaries for years, occupations, and guests. A dictionary contains a key and a value. For this use, the key will be the raw value from the CSV. The value will be the number of occurrences in the dataset.
- We open the file and start reading line by line using a reader object.
- On each line, we take the values of interest (year, occupation, and name):
- See whether the value is present in the appropriate dictionary
- If it is there, increment the value (counter)
- Otherwise, initialize an entry in the dictionary
- The entire reader block is wrapped by a try/except handler. There ...