Skip to Content
Learning Data Science
book

Learning Data Science

by Sam Lau, Joseph Gonzalez, Deborah Nolan
September 2023
Beginner
596 pages
15h 31m
English
O'Reilly Media, Inc.
Content preview from Learning Data Science

Data Sources

All of the data analyzed in this book are available on the book’s website and GitHub repository. These datasets are from open repositories and from individuals. We acknowledge them all here, and include, as appropriate, the filename for the data stored in our repository, a description of the resource, a link to the original source, a related publication, and the author(s)/owner(s).

To begin, we provide the sources for the four case studies in the book. Our analysis of the data in these case studies is based on research articles or, in one case, a blog post. We generally follow the line of inquiry in these sources, simplifying the analyses to match the level of the book.

Here are the four case studies:

seattle_bus_times.csv
Mark Hallenbeck of the Washington State Transportation Center provides the Seattle Transit data. Our analysis is based on “The Waiting Time Paradox, or, Why Is My Bus Always Late?” by Jake VanderPlas.
aqs_06-067-0010.csv, list_of_aqs_sites.csv, matched_pa_aqs.csv, list_of_purpleair_sensors.json, and purpleair_AMTS
The datasets used in the study of air quality monitors are available from Karoline Barkjohn of the Environmental Protection Agency. These were originally acquired by Barkjohn and collaborators from the US Air Quality System and PurpleAir. Our analysis is based on “Development and Application of a United States-Wide Correction for PM 2.5 Data Collected with the PurpleAir Sensor” by Barkjohn, Brett Gantt, and Andrea Clements.
donkeys.csv ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Dive Into Data Science

Dive Into Data Science

Bradford Tuckfield
Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali

Publisher Resources

ISBN: 9781098112998Errata PageSupplemental Content