Searching with XPath

In order to avoid exhaustive iteration and the checking of every element, we need to use XPath, which is a query language that was developed specifically for XML, and is supported by lxml.

To get started with XPath, use the Python shell from the last section, and do the following:

>>> root.xpath('body') [<Element body at 0x4477530>]

This is the simplest form of XPath expression; it searches for children of the current element that have tag names that match the specified tag name. The current element is the one we call xpath() on—in this case, root. The root element is the top-level <html> element in the HTML document, and so the returned element is the <body> element.

XPath expressions can contain multiple levels of elements. ...

Get Learning Python Networking - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.