The lxml parser (https://lxml.de ) is the main module for analysis of XML documents and libxslt.
The main module features are as follows:
- Support for XML and HTML
- An API based on ElementTree
- Support to selected elements of the document through XPath expressions
The installation of the XML parser can be done through the official repository:
pip install lxml
lxml.etree is a submodule within the lxml library that provides methods such as XPath(), which supports expressions with XPath selector syntax. With this example, we see the use of the parser to read an HTML file and extract the text from the title tag through an XPath expression:
from lxml import html,etreesimple_page = open('data/simple.html').read()parser = etree.HTML(simple_page) ...