December 2018
Beginner to intermediate
796 pages
19h 54m
English
The synchronous scraper only uses Python standard libraries such as urllib. It downloads the home page of three popular sites and a fourth site whose loading time can be delayed to simulate a slow connection. It prints the respective page sizes and the total running time.
Here's the code for the synchronous scraper located at src/extras/sync.py:
"""Synchronously download a list of webpages and time it""" from urllib.request import Request, urlopen from time import time sites = [ "http://news.ycombinator.com/", "https://www.yahoo.com/", "http://www.aliexpress.com/", "http://deelay.me/5000/http://deelay.me/", ] def find_size(url): req = Request(url) with urlopen(req) as response: page = response.read() return len(page) ...
Read now
Unlock full access