Python & XML By Christopher A. Jones & Fred L. Drake, Jr. The unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification This page was updated August 25, 2003. UNCONFIRMED errors and comments from readers: (10) last paragraph; "because its easier to concentrate on the actual structure" The word "its" should be "it's". (29) first complete paragraph; "At that point, the child's value for xml:space takes precedence for itself and it's descendants." The word "it's" should be "its". [32] First code example; My familiarity with XML is not very high (this book), but an XML example you give contradicts XML formatting described earlier in the book. You text: What I think it should be: The ending slash in an element is for elements that only have one tag, while in this example you are trying to show an element with two tags that is empty. {70} 6 lines from bottom; The 31st line of genxml.py is in error and so will not print the methods for the PyXML tree walk. the line: elif line.find("def") > 0 and line[:1] == ":" and inClass: should be: elif line.find("def") > -1 and line[1] == ":" and inClass: [72] middle; The code after line pyFile.close() is incorrectly indented. The if statement and the last statement should be at the same level as the first line of code. {74-75} Example 3-9; The code produces invalid HTML: block-level element p is not allowed inside inline element b. Could remove the b element and use e.g. h1, h2 and h3 instead of p's, br's and nbsp's. [90] 2nd line of Example 4-2; the first import line in his file fails with standard Pyton 2.2.1 distribution. Searching reveals there is no module xml.dom.ext. Same problem in Example 4-3. {101} Example 4-6; `if self.author:` statement is duplicated. [101] 2nd "paragraph"; In the first "if self.author:" conditional, the subsequent "s =" statement, intended to be conditionally executed, is not indented as required by Python. {101} Example 4-6; class Article has time value, but time is not set. [106] POSTING_FORM (constance variable).; The variable "POSTING_FORM" is HTTP method is "POST", but attribute of "action" has "GET" method. I fiexed to: POSTING_FORM = '''\

Title:

Contributor:

Author:

Contents:

''' {107} middle; "The query string is checked ..." is inaccurate: the data comes from stdin, as POST method is used. (139) beginning of section "A More Complex Example"; Probably genxml.py is meant instead of index.py. Last sentence: "...followed by ... class elements, followed by ..." should be "containing ... class elements, containing ..." Bottom: the sample code has trailing colons in the attribute values, unlike the output of genxml.py. {140} Section "File Template"; The HTML is not valid: table elements are not allowed inside p elements. The enclosing p should be removed. (The current code reads as an empty paragraph which is implicitly closed by the following table element, after which comes an unmatched

.) (142) middle; The template for "method" contains font tags, unlike on p. 140. The tags have no class attribute, unlike in the splitted code. [146] 3rd Paragraph; (also Example 6.8) To establish a stylesheet and transform it to a html string, you have to do: xsltproc.appendStylesheetUri("story.xsl") html = xsltproc.runUri("story.xml") But due to an error in XSLT of PyXML, the appendStylesheetUri() is broken and you can not run this programme. {148} First if statement (if not mode:); in Example 6-8, the code listing for xslt.cgi contains: if not mode: print "" print "

No mode given

" print "" should probably be: if not mode: print "" print "

No mode given

" print "" {161} 2nd line below Example 7-9; "The contents of the flat file are sent by the browser in the form of a GET request": The example uses POST, not GET (p. 159 top) [161] Example 7-9; In example 7-9 (flatfile.cgi), CGI captures the flat file from HTTP request and present it on the browser. To get the flat file the following statement is used: flatfile = query.getvalue("flatfile","")[0] flatfile contains only a '#'. To get the whole form you have to discard [0]. In other words here is the [0] superfluous. It means the above statement in correct form would be: flatfile = query.getvalue("flatfile","") [164/165] In FlatfileParser class on line "if ':' in line:"; if ':' in line: should be if ':' not in line: since this condition is checking for a blank line or rather any line without a : [172] 2nd code snippet; The XMP tag should not be used. It was obsolete already in early HTML drafts prior to HTML 2.0, and remained obsolete in HTML 2.0, HTML 3.2 and HTML 4. The XML code could be presented as HTML by PrettyPrinting it in a StringIO and then calling cgi.escape on it. (On p. 174 also.) (173) middle; The code does not print a head or link rel="stylesheet", as on page 170. {189-190} The HTML code starting on p. 189; The HTML-code tries to have a table inside a p. (Actually the start tag implicitly closes the p element, so that there's an empty paragraph before the table, and an unmatched

after the table.) (193) bottom; The headers in the sample do not match the ones created in Example 8-4. (198) Line 1; The method is named printCustomHTTPResponse, not printCustomHTTPHeaders. {198-200} Example 8-5; Misplaced comments: e.g. the description of method do_GET becomes the docstring of the class. (223) all page; The PyCalcSerial.py script uses the MS SOAP toolkit 2.0. To get it to run with the actual SOAP toolkit 3.0 I had to made several changes in the code: 1. Inserting the version number 30 at several places: EndPointUrl = \ "http://centauri/MSSoapSamples/Calc/Service/SrSz/AspVbs/Calc.asp" instead of EndPointUrl = \ "http://localhost/MSSoapSamples30/Calc/Service/SrSz/AspVbs/Calc.asp" (It wasn't really clear from the text in the book, that centauri is the (local) hostname. Using localhost seems clearer to me.) connector = win32com.client.Dispatch("MSSOAP.HttpConnector30") ... serializer = win32com.client.Dispatch("MSSOAP.SoapSerializer30") ... reader = win32com.client.Dispatch("MSSOAP.SoapReader30") instead of connector = win32com.client.Dispatch("MSSOAP.HttpConnector") ... serializer = win32com.client.Dispatch("MSSOAP.SoapSerializer") ... reader = win32com.client.Dispatch("MSSOAP.SoapReader") 2. Changing the capitalization of all the SoapSerializer methods: # Create SOAP Envelope serializer.StartEnvelope() ... serializer.EndEnvelope() instead of # Create SOAP Envelope serializer.startEnvelope() ... serializer.endEnvelope() 3. Similary changing the capitalization for the SoapReader: # check for errors if reader.Fault: print "Error: ", reader.FaultString.text # Return calculation value return reader.RpcResult.text instead of # check for errors if reader.Fault: print "Error: ", reader.faultstring.Text # Return calculation value return reader.RPCResult.Text [236] 3rd paragraph; My PyXML's (0.7.1) version of PrettyPrint returns XML processing directives like at the top of the text. When the XML switch gets this and creates a DOM from XML (bottom of p. 272) it raises an exception: SAXParseException: :8:13: xml processing instruction not at start of external entity This is because the XML processing directives get stuck in the middle of the message inside the tags. To fix it, I changed on p.236 PrettyPrint(newdoc, strXML) to PrettyPrint(newdoc.documentElement, strXML) This gets rid of the XML processing directives. (243) Function insertProfile; The code is not matching p. 238: raises an exception when strXML is false; on p. 238 returned zero in that case. {253} First (only) if statement; The if statement has unnecessary parentheses. {260} Method setXMLMessage; The variables self._headerdom and self._bodydom are not used for anything. Remove or at least use instead of calling getElementsByTagName repeatedly. {266} 1/5 from bottom; The API comment suggests that the module provides a callable named sendMessage. Change import xsc responseXML = xsc.sendMessage(strXMLMessage) into: import xsc xc = xsc.xsc() responseXML = xc.sendMessage(strXMLMessage) [268] 2nd code snippet; processXMLMessagePost removes all occurrences of "n=". Should only remove the leading two characters. (current code breaks XML that has any attributes whose name ends with an n.) The docstring refers to some older version of the code (on p. 271 also) [273] method echoResponse; The replacement of "n=" replaces all occurrences, which is incorrect. Could change msg.setXMLMessage(unquote_plus(strPostData).replace("n=","")) into: msg.setXMLMessage(unquote_plus(strPostData.replace("n=",""))) While strPostData is in its encoded form, it cannot contain more than the leading occurrence of "n=". [279] middle; qs.get("mode", "") should be qs.get("mode", [""]), and similarly for id. Or perhaps they should be written like qs.get("mode", [""])[0] to avoid repeated indexing later. [282] 1st code snippet; The indexing in qs.get(something, "")[0] fails, if qs.get returns the default value "": the default should be [""] instead. The code would also look nicer as a loop: for field in ("firstname", "lastname", "address1", "address2", "city", "state", "zip"): value = qs.get(field, [""])[0].strip() xmlmsg += "<%>%\n" % (field, value, field) {283} bottom; Unnecessary parentheses in the if statements