Skip to Main Content
Data Visualization with Python and JavaScript, 2nd Edition
book

Data Visualization with Python and JavaScript, 2nd Edition

by Kyran Dale
December 2022
Beginner to intermediate content levelBeginner to intermediate
566 pages
12h 58m
English
O'Reilly Media, Inc.
Book available
Content preview from Data Visualization with Python and JavaScript, 2nd Edition

Chapter 5. Getting Data Off the Web with Python

A fundamental part of the data visualizer’s skill set is getting the right dataset in as clean a form as possible. Sometimes you will be given a nice, clean dataset to analyze but often you will be tasked with either finding the data and/or cleaning the data supplied.

And more often than not these days, getting data involves getting it off the web. There are various ways you can do this, and Python provides some great libraries that make sucking up the data easy.

The main ways to get data off the web are:

  • Get a raw data file in a recognized data format (e.g., JSON or CSV) over HTTP.

  • Use a dedicated API to get the data.

  • Scrape the data by getting web pages via HTTP and parsing them locally for the required data.

This chapter will deal with these ways in turn, but first let’s get acquainted with the best Python HTTP library out there: Requests.

Getting Web Data with the Requests Library

As we saw in Chapter 4, the files that are used by web browsers to construct web pages are communicated via the Hypertext Transfer Protocol (HTTP), first developed by Tim Berners-Lee. Getting web content in order to parse it for data involves making HTTP requests.

Negotiating HTTP requests is a vital part of any general-purpose language, but getting web pages with Python used to be a rather irksome affair. The venerable urllib2 library was hardly user-friendly, with a very clunky API. Requests, courtesy of Kenneth Reitz, changed that, making ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Interactive Data Visualization with Python - Second Edition

Interactive Data Visualization with Python - Second Edition

Abha Belorkar, Sharath Chandra Guntuku, Shubhangi Hora, Anshu Kumar

Publisher Resources

ISBN: 9781098111861Errata Page