May 2017
Beginner to intermediate
220 pages
5h 2m
English
To support caching, the download function developed in Chapter 1, Introduction to Web Scraping, needs to be modified to check the cache before downloading a URL. We also need to move throttling inside this function and only throttle when a download is made, and not when loading from a cache. To avoid the need to pass various parameters for every download, we will take this opportunity to refactor the download function into a class so parameters can be set in the constructor and reused numerous times. Here is the updated implementation to support this:
from chp1.throttle import Throttlefrom random import choiceimport requestsclass Downloader: def __init__(self, delay=5, user_agent='wswp', proxies=None, ...
Read now
Unlock full access