The htmllib Module

The htmllib module supplies a class named HTMLParser that subclasses SGMLParser and defines start_tag, do_tag, and end_tag methods for HTML 2.0 tags. HTMLParser implements and overrides methods to perform calls to methods of a formatter object, covered in “The formatter Module” in The htmllib Module. You can subclass HTMLParser and override methods. In addition to start_tag, do_tag, and end_tag methods, an instance h of HTMLParser supplies the following attributes and methods.

anchor_bgn

h.anchor_bgn(href,name,type)

Called for each <a> tag. href, name, and type are the string values of the tag’s attributes with the same names. HTMLParser’s implementation of anchor_bgn maintains a list of outgoing hyperlink targets (i.e., href arguments of method s.anchor_bgn) in an instance attribute named s.anchorlist.

anchor_end

h.anchor_end( )

Called for each </a> end tag. HTMLParser’s implementation of anchor_end emits to the formatter a footnote reference that is an index within s.anchorlist. In other words, by default, HTMLParser asks the formatter to format an <a>/</a> tag pair as the text inside the tag, followed by a footnote reference number that points to the URL in the <a> tag. Of course, it’s up to the formatter to deal with this formatting request.

anchorlist

The h.anchor_list attribute contains the list of outgoing hyperlink target URLs, as built by method h.anchor_bgn.

formatter

The h.formatter attribute is the formatter object f associated with h, which you pass as the ...

Get Python in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.