The formatter Module
The formatter module provides formatter classes that can be used together with
the htmllib
module.
This module provides two class families, formatters and writers. Formatters convert a stream of tags and data strings from the HTML parser into an event stream suitable for an output device, and writers render that event stream on an output device. Example 5-13 demonstrates.
In most cases, you can use the
AbstractFormatter class to do the formatting.
It calls methods on the writer object, representing different kinds of
formatting events. The AbstractWriter class
simply prints a message for each method call.
Example 5-13. Using the formatter Module to Convert HTML to an Event Stream
File: formatter-example-1.py
import formatter
import htmllib
w = formatter.AbstractWriter()
f = formatter.AbstractFormatter(w)
file = open("samples/sample.htm")
p = htmllib.HTMLParser(f)
p.feed(file.read())
p.close()
file.close()
send_paragraph(1)
new_font(('h1', 0, 1, 0))
send_flowing_data('A Chapter.')
send_line_break()
send_paragraph(1)
new_font(None)
send_flowing_data('Some text. Some more text. Some')
send_flowing_data(' ')
new_font((None, 1, None, None))
send_flowing_data('emphasized')
new_font(None)
send_flowing_data(' text. A')
send_flowing_data(' link')
send_flowing_data('[1]')
send_flowing_data('.'
In addition to the AbstractWriter class, the
formatter module provides a
NullWriter class, which ignores all events
passed to it, and a DumbWriter class that converts the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access