Getting ready

We will be using the planets data page and converting that data into CSV and JSON files. Let's start by loading the planets data from the page into a list of python dictionary objects. The following code (found in (03/get_planet_data.py) provides a function that performs this task, which will be reused throughout the chapter:

import requestsfrom bs4 import BeautifulSoupdef get_planet_data():   html = requests.get("http://localhost:8080/planets.html").text   soup = BeautifulSoup(html, "lxml")   planet_trs = soup.html.body.div.table.findAll("tr", {"class": "planet"})   def to_dict(tr):      tds = tr.findAll("td")      planet_data = dict()      planet_data['Name'] = tds[1].text.strip()      planet_data['Mass'] = tds[2].text.strip()      planet_data['Radius'] = ...

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.