Skip to Content
Learn Python by Building Data Science Applications
book

Learn Python by Building Data Science Applications

by Philipp Kats, David Katz
August 2019
Beginner
482 pages
12h 56m
English
Packt Publishing
Content preview from Learn Python by Building Data Science Applications

Understanding time-based tasks

Pipelines are especially useful to schedule data collection, for example, downloading new data every night.

Say we want to collect new data on 311 calls in NYC for the previous day, every morning. First, let's write the pulling function itself. The code is fairly trivial. You can take a look at the Socrata (the data-sharing platform New York uses) API documentation via this link, https://dev.socrata.com/consumers/getting-started.html. The only tricky part is that the dataset can be large—but Socrata won't give us more than 50,000 rows at once. Hence, if the length of the input is equal to 50,000, most likely, the data was capped, and we'll need to make another pull with the offset, over and over until the number ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python for Data Science

Python for Data Science

Yuli Vasiliev
Introduction to Machine Learning with Python

Introduction to Machine Learning with Python

Andreas C. Müller, Sarah Guido

Publisher Resources

ISBN: 9781789535365Supplemental Content