Skip to Content
Data Science on the Google Cloud Platform, 2nd Edition
book

Data Science on the Google Cloud Platform, 2nd Edition

by Valliappa Lakshmanan
March 2022
Beginner to intermediate
459 pages
12h 19m
English
O'Reilly Media, Inc.
Book available
Content preview from Data Science on the Google Cloud Platform, 2nd Edition

Chapter 2. Ingesting Data into the Cloud

In Chapter 1, we explored the idea of deciding whether to cancel a meeting in a data-driven way. We decided on a probabilistic decision criterion: to cancel the meeting with a client if the probability of the flight arriving within 15 minutes of the scheduled arrival time was less than 70%. To model the arrival delay given a variety of attributes about the flight, we need historical data that covers a large number of flights. Historical data that includes this information from 1987 onward is available from the US Bureau of Transportation Statistics (BTS). One of the reasons that the government captures this data is to monitor the fraction of flights by a carrier that are on-time (defined as flights that arrive less than 15 minutes late), so as to be able to hold airlines accountable.1 Because the key use case is to compute on-time performance, the dataset that captures flight delays is called Airline On-Time Performance Data. That’s the dataset we will use in this book.

All of the code snippets in this chapter are available in the folder 02_ingest of the book’s GitHub repository. See the last section of Chapter 1 for instructions on how to clone the repository, and see the README.md file in the 02_ingest directory for instructions on how to do the steps described in this chapter.

Airline On-Time Performance Data

For nearly 40 years, all major US air carriers have been required to file statistics about each of their domestic flights with ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform

Adi Wijaya
Visualizing Google Cloud

Visualizing Google Cloud

Priyanka Vergadia

Publisher Resources

ISBN: 9781098118945Errata PageSupplemental Content