Skip to Content
Python in a Nutshell
book

Python in a Nutshell

by Alex Martelli
March 2003
Intermediate to advanced
656 pages
39h 30m
English
O'Reilly Media, Inc.
Content preview from Python in a Nutshell

The HTMLParser Module

Module HTMLParser supplies one class, HTMLParser, that you subclass to override and add methods. HTMLParser.HTMLParser is similar to sgmllib.SGMLParser, but is simpler and able to parse XHTML as well. The main differences between HTMLParser and SGMLParser are the following:

  • HMTLParser does not call back to methods named do_ tag, start_ tag, and end_ tag. To process tags and end tags, your subclass X of HTMLParser must override methods handle_starttag and/or handle_endtag and check explicitly for the tags it wants to process.

  • HMTLParser does not keep track of, nor check, tag nesting in any way.

  • HMTLParser does nothing, by default, to resolve character and entity references. Your subclass X of HTMLParser must override methods handle_charref and/or handle_entityref if it needs to perform processing of such references.

The most frequently used methods of an instance h of a subclass X of HTMLParser are as follows.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python in a Nutshell, 3rd Edition

Python in a Nutshell, 3rd Edition

Alex Martelli, Anna Ravenscroft, Steve Holden
Python in a Nutshell, 4th Edition

Python in a Nutshell, 4th Edition

Alex Martelli, Anna Martelli Ravenscroft, Steve Holden, Paul McGuire
Data Wrangling with Python

Data Wrangling with Python

Jacqueline Kazil, Katharine Jarmul

Publisher Resources

ISBN: 0596001886Supplemental ContentCatalog PageErrata