SampleData – a simple API for loading data
Loading data into a Notebook is one of the most repetitive tasks a data scientist can do, yet depending on the framework or data source being used, writing the code can be difficult and time-consuming.
Let's take a concrete example of trying to load a CSV file from an open data site (say https://data.cityofnewyork.us) into both a pandas and Apache Spark DataFrame.
Note
Note: Going forward, all the code is assumed to run in a Jupyter Notebook.
For pandas, the code is pretty straightforward as it provides an API to directly load from URL:
import pandas data_url = "https://data.cityofnewyork.us/api/views/e98g-f8hy/rows.csv?accessType=DOWNLOAD" building_df = pandas.read_csv(data_url) building_df
The last statement, ...
Get Data Analysis with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.