The HTMLParser Module
Module HTMLParser
supplies one class, HTMLParser, that you subclass
to override and add methods. HTMLParser.HTMLParser
is similar to sgmllib.SGMLParser, but is simpler
and able to parse XHTML as well. The main differences between
HTMLParser and SGMLParser are
the following:
HMTLParserdoes not call back to methods nameddo_tag,start_tag, andend_tag. To process tags and end tags, your subclassXofHTMLParsermust override methodshandle_starttagand/orhandle_endtagand check explicitly for the tags it wants to process.HMTLParserdoes not keep track of, nor check, tag nesting in any way.HMTLParserdoes nothing, by default, to resolve character and entity references. Your subclassXofHTMLParsermust override methodshandle_charrefand/orhandle_entityrefif it needs to perform processing of such references.
The most frequently used methods of an instance
h of a subclass
X of HTMLParser are as
follows.