O'Reilly logo

Effective Python Penetration Testing by Rejah Rehim

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Parsing HTML with lxml

Another powerful, fast, and flexible parser is the HTML Parser that comes with lxml. As lxml is an extensive library written for parsing both XML and HTML documents, it can handle messed up tags in the process.

Let's start with an example.

Here, we will use the requests module to retrieve the web page and parse it with lxml:

#Importing modules 
from lxml import html 
import requests 
 
response = requests.get('http://packtpub.com/') 
tree = html.fromstring(response.content) 

Now the whole HTML is saved to tree in a nice tree structure that we can inspect in two different ways: XPath or CSS Select. XPath is used to navigate through elements and attributes to find information in structured documents such as HTML or XML.

We can use any ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required