June 2025
Beginner to intermediate
473 pages
13h 30m
English
In Chapter 9, I already warned against using regular expressions for processing HTML documents. Nevertheless, regular expressions are often used in practice for exactly this purpose—simply because using them is obvious and requires little effort. The following Python script is a good example of this use. The goal of the script is to download all images linked on an HTML page via <img src=...>.
The code is basically easy to understand: The script first creates the local directory tmp, if it does not exist yet, and then downloads the HTML document with the address specified in url. findall searches in it for expressions of the <img src="..." type and returns a list of the regex group as a result (i.e., ...
Read now
Unlock full access