In the Bigger Data chapter, we worked with real data describing 501 earthquakes that occurred during a month in late 2018. Given the raw data, it might be difficult to see any type of pattern or similarity in this data set. However, if we extend our cluster analysis technique from the previous section, we might discover some interesting results.
Our first problem will be to find a way to process and store the data contained in the data file so that we can use it in our clustering algorithm. Recall that in the
earthquakes.csv file, the first line contains titles that identify each data item, like this:
Each succeeding line of the file describes one earthquake. The line ...