This section will walk through visualizing the data from the San Francisco Fire Department.
- Execute the following script to get a cursory identification of the unique values in the Call Type Group column:
df.select('Call Type Group').distinct().show()
- There are five main categories:
- Alarm.
- Potentially Life-threatening.
- Non Life-threatening.
- Fire.
- null.
- Unfortunately, one of those categories is null values. It would be useful to get a row count of each unique value to identify how many null values there are in the dataset. Execute the following script to generate a row count of each unique value for the column Call Type Group:
df.groupBy('Call Type Group').count().show()
- Unfortunately, there are over 2.8 M rows of data ...