May 2001
Intermediate to advanced
304 pages
6h 12m
English
The htmlentitydefs module contains a dictionary with many ISO Latin-1 character
entities used by HTML. Its use is demonstrated in Example 5-10.
Example 5-10. Using the htmlentitydefs Module
File: htmlentitydefs-example-1.py
import htmlentitydefs
entities = htmlentitydefs.entitydefs
for entity in "amp", "quot", "copy", "yen":
print entity, "=", entities[entity]
amp = &
quot = "
copy = \302\251
yen = \302\245Example 5-11 shows how to combine regular expressions with
this dictionary to translate entities in a string (the opposite of
cgi.escape).
Example 5-11. Using the htmlentitydefs Module to Translate Entities
File: htmlentitydefs-example-2.py
import htmlentitydefs
import re
import cgi
pattern = re.compile("&(\w+?);")
def descape_entity(m, defs=htmlentitydefs.entitydefs):
# callback: translate one entity to its ISO Latin value
try:
return defs[m.group(1)]
except KeyError:
return m.group(0) # use as is
def descape(string):
return pattern.sub(descape_entity, string)
print descape("<spam&eggs>")
print descape(cgi.escape("<spam&eggs>"))
<spam&eggs>
<spam&eggs>Finally, Example 5-12 shows how to use translate reserved XML
characters and ISO Latin-1 characters to an XML string. This is
similar to cgi.escape, but it also replaces
non-ASCII characters.
Example 5-12. Escaping ISO Latin-1 Entities
File: htmlentitydefs-example-3.py import htmlentitydefs import re, string # this pattern matches substrings of reserved and non-ASCII characters pattern = re.compile(r"[&<>\"\x80-\xff]+") ...
Read now
Unlock full access