December 2018
Beginner to intermediate
490 pages
10h 38m
English
Loading data into a Notebook is one of the most repetitive tasks a data scientist can do, yet depending on the framework or data source being used, writing the code can be difficult and time-consuming.
Let's take a concrete example of trying to load a CSV file from an open data site (say https://data.cityofnewyork.us) into both a pandas and Apache Spark DataFrame.
Note: Going forward, all the code is assumed to run in a Jupyter Notebook.
For pandas, the code is pretty straightforward as it provides an API to directly load from URL:
import pandas data_url = "https://data.cityofnewyork.us/api/views/e98g-f8hy/rows.csv?accessType=DOWNLOAD" building_df = pandas.read_csv(data_url) building_df
The last statement, ...
Read now
Unlock full access