May 2017
Beginner to intermediate
220 pages
5h 2m
English
Beautiful Soup is a popular library that parses a web page and provides a convenient interface to navigate content. If you do not already have this module, the latest version can be installed using this command:
pip install beautifulsoup4
The first step with Beautiful Soup is to parse the downloaded HTML into a soup document. Many web pages do not contain perfectly valid HTML and Beautiful Soup needs to correct improper open and close tags. For example, consider this simple web page containing a list with missing attribute quotes and closing tags:
<ul class=country> <li>Area <li>Population </ul>
If the Population item is interpreted as a child of the Area item instead of the list, we could get unexpected results when ...
Read now
Unlock full access