Chapter 8. Processing HTML

Several modules included with Python provide virtually all the necessary tools necessary to parse and process HTML documents without needing to use a web server or web browser. Parsing HTML files is becoming much more commonplace in such applications as search engines, document indexing, document conversion, data retrieval, site backup or migration, as well as several others.

Because there is no way to cover the extent of options Python provides in HTML processing, the first two phrases in this chapter focus on specific Python modules to simplify opening HTML documents locally and on the Web. The rest of the phrases discuss how to use the Python modules to quickly parse the data in the HTML files to process specific items, ...

Get Python Phrasebook: Essential Code and Commands now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.