Skip to Content
Data Science Bookcamp
book

Data Science Bookcamp

by Leonard Apeltsin
November 2021
Beginner to intermediate
704 pages
20h 16m
English
Manning Publications
Content preview from Data Science Bookcamp

12 Case study 3 solution

This section covers

  • Extracting and visualizing locations
  • Cleaning data
  • Clustering locations

Our goal is to extract locations from disease-related headlines to uncover the largest active epidemics within and outside of the United States. We will do as follows:

  1. Load the data.

  2. Extract locations from the text using regular expressions and the GeoNamesCache library.

  3. Check the location matches for errors.

  4. Cluster the locations based on geographic distance.

  5. Visualize the clusters on a map, and remove any errors.

  6. Output representative locations from the largest clusters to draw interesting conclusions.

Warning Spoiler alert! The solution to case study 3 is about to be revealed. I strongly encourage you to try ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali
Learning Data Science

Learning Data Science

Sam Lau, Joseph Gonzalez, Deborah Nolan

Publisher Resources

ISBN: 9781617296253Publisher SupportOtherPublisher WebsiteSupplemental ContentErrata PagePurchase Link