Python & XML
By Christopher A. Jones & Fred L. Drake, Jr.
The unconfirmed error reports are from readers. They have not yet been
approved or disproved by the author or editor and represent solely the
opinion of the reader.
Here's a key to the markup:
[page-number]: serious technical mistake
{page-number}: minor technical mistake
: important language/formatting problem
(page-number): language change or minor formatting problem
?page-number?: reader question or request for clarification
This page was updated August 25, 2003.
UNCONFIRMED errors and comments from readers:
(10) last paragraph;
"because its easier to concentrate on the actual structure"
The word "its" should be "it's".
(29) first complete paragraph;
"At that point, the child's value for xml:space takes precedence for itself and it's
descendants."
The word "it's" should be "its".
[32] First code example;
My familiarity with XML is not very high (this book), but an XML example you give
contradicts XML formatting described earlier in the book.
You text:
What I think it should be:
The ending slash in an element is for elements that only have one tag, while in this
example you are trying to show an element with two tags that is empty.
{70} 6 lines from bottom;
The 31st line of genxml.py is in error and so will not print the methods for the
PyXML tree walk.
the line:
elif line.find("def") > 0 and line[:1] == ":" and inClass:
should be:
elif line.find("def") > -1 and line[1] == ":" and inClass:
[72] middle;
The code after line
pyFile.close()
is incorrectly indented. The if statement and the last statement should be at the
same level as the first line of code.
{74-75} Example 3-9;
The code produces invalid HTML: block-level element p is not allowed inside inline
element b. Could remove the b element and use e.g. h1, h2 and h3 instead of p's, br's
and nbsp's.
[90] 2nd line of Example 4-2;
the first import line in his file fails with standard Pyton 2.2.1 distribution.
Searching reveals there is no module xml.dom.ext. Same problem in Example 4-3.
{101} Example 4-6;
`if self.author:` statement is duplicated.
[101] 2nd "paragraph";
In the first "if self.author:" conditional, the subsequent "s =" statement, intended
to be conditionally executed, is not indented as required by Python.
{101} Example 4-6;
class Article has time value, but time is not set.
[106] POSTING_FORM (constance variable).;
The variable "POSTING_FORM" is HTTP method is "POST", but attribute of "action" has
"GET" method.
I fiexed to:
POSTING_FORM = '''\
'''
{107} middle;
"The query string is checked ..." is inaccurate: the data comes from stdin, as POST
method is used.
(139) beginning of section "A More Complex Example";
Probably genxml.py is meant instead of index.py.
Last sentence:
"...followed by ... class elements, followed by ..."
should be
"containing ... class elements, containing ..."
Bottom: the sample code has trailing colons in the attribute values,
unlike the output of genxml.py.
{140} Section "File Template";
The HTML is not valid: table elements are not allowed inside p elements. The
enclosing p should be removed. (The current code reads as an empty paragraph which is
implicitly closed by the following table element, after which comes an unmatched
.)
(142) middle;
The template for "method" contains font tags, unlike on p. 140. The tags have no
class attribute, unlike in the splitted code.
[146] 3rd Paragraph;
(also Example 6.8) To establish a stylesheet and transform it to a html string, you
have to do:
xsltproc.appendStylesheetUri("story.xsl")
html = xsltproc.runUri("story.xml")
But due to an error in XSLT of PyXML, the appendStylesheetUri() is broken and you can
not run this programme.
{148} First if statement (if not mode:);
in Example 6-8, the code listing for xslt.cgi contains:
if not mode:
print ""
print "No mode given
"
print ""
should probably be:
if not mode:
print ""
print "No mode given
"
print ""
{161} 2nd line below Example 7-9;
"The contents of the flat file are sent by the browser in the form of a GET request":
The example uses POST, not GET (p. 159 top)
[161] Example 7-9;
In example 7-9 (flatfile.cgi), CGI captures the flat file from HTTP request and
present it on the browser. To get the flat file the following statement is used:
flatfile = query.getvalue("flatfile","")[0]
flatfile contains only a '#'. To get the whole form you have to discard [0]. In other
words here is the [0] superfluous. It means the above statement in correct form would
be:
flatfile = query.getvalue("flatfile","")
[164/165] In FlatfileParser class on line "if ':' in line:";
if ':' in line:
should be
if ':' not in line:
since this condition is checking for a blank line or rather any line without a :
[172] 2nd code snippet;
The XMP tag should not be used. It was obsolete already in early HTML drafts prior to
HTML 2.0, and remained obsolete in HTML 2.0, HTML 3.2 and HTML 4. The XML code could
be presented as HTML by PrettyPrinting it in a StringIO and then calling cgi.escape
on it. (On p. 174 also.)
(173) middle;
The code does not print a head or link rel="stylesheet", as on page 170.
{189-190} The HTML code starting on p. 189;
The HTML-code tries to have a table inside a p. (Actually the start tag
implicitly closes the p element, so that there's an empty paragraph before the table,
and an unmatched after the table.)
(193) bottom;
The headers in the sample do not match the ones created in Example 8-4.
(198) Line 1;
The method is named printCustomHTTPResponse, not printCustomHTTPHeaders.
{198-200} Example 8-5;
Misplaced comments: e.g. the description of method do_GET becomes the docstring of
the class.
(223) all page;
The PyCalcSerial.py script uses the MS SOAP toolkit 2.0. To get it to run with the
actual SOAP toolkit 3.0 I had to made several changes in the code:
1. Inserting the version number 30 at several places:
EndPointUrl = \
"http://centauri/MSSoapSamples/Calc/Service/SrSz/AspVbs/Calc.asp"
instead of
EndPointUrl = \
"http://localhost/MSSoapSamples30/Calc/Service/SrSz/AspVbs/Calc.asp"
(It wasn't really clear from the text in the book, that centauri is the (local)
hostname. Using localhost seems clearer to me.)
connector = win32com.client.Dispatch("MSSOAP.HttpConnector30")
...
serializer = win32com.client.Dispatch("MSSOAP.SoapSerializer30")
...
reader = win32com.client.Dispatch("MSSOAP.SoapReader30")
instead of
connector = win32com.client.Dispatch("MSSOAP.HttpConnector")
...
serializer = win32com.client.Dispatch("MSSOAP.SoapSerializer")
...
reader = win32com.client.Dispatch("MSSOAP.SoapReader")
2. Changing the capitalization of all the SoapSerializer methods:
# Create SOAP Envelope
serializer.StartEnvelope()
...
serializer.EndEnvelope()
instead of
# Create SOAP Envelope
serializer.startEnvelope()
...
serializer.endEnvelope()
3. Similary changing the capitalization for the SoapReader:
# check for errors
if reader.Fault:
print "Error: ", reader.FaultString.text
# Return calculation value
return reader.RpcResult.text
instead of
# check for errors
if reader.Fault:
print "Error: ", reader.faultstring.Text
# Return calculation value
return reader.RPCResult.Text
[236] 3rd paragraph;
My PyXML's (0.7.1) version of PrettyPrint returns XML processing directives like
at the top of the text. When the XML switch gets this and creates a DOM from XML
(bottom of p. 272) it raises an exception:
SAXParseException: :8:13: xml processing instruction not at start of
external entity
This is because the XML processing directives get stuck in the middle of the message
inside the tags.
To fix it, I changed on p.236
PrettyPrint(newdoc, strXML)
to
PrettyPrint(newdoc.documentElement, strXML)
This gets rid of the XML processing directives.
(243) Function insertProfile;
The code is not matching p. 238: raises an exception when strXML is false; on p. 238
returned zero in that case.
{253} First (only) if statement;
The if statement has unnecessary parentheses.
{260} Method setXMLMessage;
The variables self._headerdom and self._bodydom are not used for anything. Remove or
at least use instead of calling getElementsByTagName repeatedly.
{266} 1/5 from bottom;
The API comment suggests that the module provides a callable named sendMessage.
Change
import xsc
responseXML = xsc.sendMessage(strXMLMessage)
into:
import xsc
xc = xsc.xsc()
responseXML = xc.sendMessage(strXMLMessage)
[268] 2nd code snippet;
processXMLMessagePost removes all occurrences of "n=". Should only remove the leading
two characters. (current code breaks XML that has any attributes whose name ends with
an n.)
The docstring refers to some older version of the code (on p. 271 also)
[273] method echoResponse;
The replacement of "n=" replaces all occurrences, which is incorrect.
Could change
msg.setXMLMessage(unquote_plus(strPostData).replace("n=",""))
into:
msg.setXMLMessage(unquote_plus(strPostData.replace("n=","")))
While strPostData is in its encoded form, it cannot contain more than the leading
occurrence of "n=".
[279] middle;
qs.get("mode", "") should be qs.get("mode", [""]), and similarly for id. Or perhaps
they should be written like qs.get("mode", [""])[0] to avoid repeated indexing later.
[282] 1st code snippet;
The indexing in qs.get(something, "")[0] fails, if qs.get returns the default value
"": the default should be [""] instead.
The code would also look nicer as a loop:
for field in ("firstname", "lastname", "address1", "address2",
"city", "state", "zip"):
value = qs.get(field, [""])[0].strip()
xmlmsg += "<%>%%s>\n" % (field, value, field)
{283} bottom;
Unnecessary parentheses in the if statements