Another powerful, fast, and flexible parser is the HTML Parser that comes with lxml. As lxml is an extensive library written for parsing both XML and HTML documents, it can handle messed up tags in the process.
Let's start with an example.
Here, we will use the requests module to retrieve the web page and parse it with lxml:
#Importing modules from lxml import html import requests response = requests.get('http://packtpub.com/') tree = html.fromstring(response.content)
Now the whole HTML is saved to
tree in a nice tree structure that we can inspect in two different ways: XPath or CSS Select. XPath is used to navigate through elements and attributes to find information in structured documents such as HTML or XML.
We can use any ...