7.5 Implementing Cluster Analysis: Earthquakes

In the Bigger Data chapter, we worked with real data describing 501 earthquakes that occurred during a month in late 2018. Given the raw data, it might be difficult to see any type of pattern or similarity in this data set. However, if we extend our cluster analysis technique from the previous section, we might discover some interesting results.

7.5.1 File Processing

Our first problem will be to find a way to process and store the data contained in the data file so that we can use it in our clustering algorithm. Recall that in the earthquakes.csv file, the first line contains titles that identify each data item, like this:

Each succeeding line of the file describes one earthquake. The line ...

Get Python Programming in Context, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.