12 Case study 3 solution

This section covers

Extracting and visualizing locations
Cleaning data
Clustering locations

Our goal is to extract locations from disease-related headlines to uncover the largest active epidemics within and outside of the United States. We will do as follows:

Load the data.
Extract locations from the text using regular expressions and the GeoNamesCache library.
Check the location matches for errors.
Cluster the locations based on geographic distance.
Visualize the clusters on a map, and remove any errors.
Output representative locations from the largest clusters to draw interesting conclusions.

Warning Spoiler alert! The solution to case study 3 is about to be revealed. I strongly encourage you to try ...

Get Data Science Bookcamp now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Science Bookcamp by Leonard Apeltsin

12 Case study 3 solution

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly