The htmllib Module
The htmllib module
supplies a class named HTMLParser that subclasses
SGMLParser and defines
start_
tag,
do_
tag, and
end_
tag methods for
tags defined in HTML 2.0. HTMLParser implements
and overrides methods in terms of calls to methods of a formatter
object, covered later in this chapter. You can subclass
HTMLParser to add or override methods. In addition
to the start_
tag,
do_
tag, and
end_
tag methods, an
instance h of
HTMLParser supplies the following attributes and
methods.
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
The formatter Module
The formatter module
defines formatter and writer classes. You instantiate a formatter by
passing to the class a writer instance, and then you pass the
formatter instance to class HTMLParser of module
htmllib. You can define your own formatters and
writers by subclassing
formatter’s classes and
overriding methods appropriately, but I do not cover this advanced
and rarely used possibility in this book. An application with special
output requirements would typically define an appropriate writer,
subclassing AbstractWriter and overriding all
methods, and use class AbstractFormatter without
needing to subclass it. Module formatter supplies
the following classes.
The htmlentitydefs Module
The htmlentitydefs
module supplies just one attribute, a dictionary named
entitydefs that maps each entity defined in HTML 2.0 to the corresponding ...