O'Reilly logo

Mastering Python High Performance by Fernando Doglio

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The initial code base

Let's now list all of the code that we'll optimize in future, based on the earlier description.

The first of the following points is quite simple: a single file script that takes care of scraping and saving in JSON format like we discussed earlier. The flow is simple, and the order is as follows:

  1. It will query the list of questions page by page.
  2. For each page, it will gather the question's links.
  3. Then, for each link, it will gather the information listed from the previous points.
  4. It will move on to the next page and start over again.
  5. It will finally save all of the data into a JSON file.

The code is as follows:

from bs4 import BeautifulSoup import requests import json SO_URL = "http://scifi.stackexchange.com" QUESTION_LIST_URL = ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required