Data
For this chapter, I will be using three datasets. The first of these, scf_extract.json, is the output from the Chapter 3, Reading, Exploring, and Modifying Data - Part I, exercise. The second, artificial_roads_by_region.csv, is a fabricated dataset containing the total road length of all of the roads in different made-up regions. The third dataset is an XML file containing the search results for Wikipedia articles about data wrangling. This data was obtained from the Wikipedia search API--details are provided along with the dataset. All datasets can be retrieved from the data folder in the external resources at https://goo.gl/v4dLc3. The data should be placed in the data/input_data folder of this chapter's project directory.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access