book

Python & XML

by Christopher A. Jones, Fred L. Drake

December 2001

Intermediate to advanced

380 pages

11h 54m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Python & XML
Dedication
A Note Regarding Supplemental Files
Preface
Audience
Organization
Conventions Used in This Book
Using Code Examples
How to Contact Us
Acknowledgments

1. Python and XML
Key Advantages of XML
Application NeutralityHierarchical StructurePlatform NeutralityInternational Language Support
The XML Specifications
XML 1.0 RecommendationNamespaces in XMLXML as a Foundation
The Power of Python and XML
Python Tools for XMLThe SAX and DOM APIsMore Ways to Extract Information
What Can We Do with It?
2. XML Fundamentals
XML Structure in a Nutshell
Document Types and Schemas
Document Type DefinitionsAlternate Schema LanguagesXML SchemaTREXRELAX-NGSchematron
Types of Conformance
Physical Structures
Constructing XML Documents
Characters in XML DocumentsThe ASCII character setThe ISO-8859-1 character setUTF-8 EncodingText, Character Data, and MarkupNamesWhitespace in Character DataEnd-of-Line HandlingLanguage IdentificationThe Document PrologStart, End, and Empty Element TagsQuotes around attribute valuesCommentsProcessing InstructionsCDATA Sections
Document Type Definitions
Entity DeclarationsElement Type DeclarationsContent modelsAttribute DeclarationsAttribute data typesAttribute values and constraints
Canonical XML
The Canonical XML Data ModelDocument OrderCanonical XML Structure
Going Beyond the XML Specification
XML NamespacesExtracting Information Using XPathUsing XLink to Link XML DocumentsCommunicating with XML ProtocolsReplacing HTML with XHTMLTransforming XML with XSLT
3. The Simple API for XML
The Birth of SAX
Understanding SAX
Using SAX in an ApplicationSAX Handler ObjectsContentHandlerErrorHandlerDTDHandlerEntityResolverOther handler objectsSAX Reader Objects
Reading an Article
Writing a Simple HandlerCreating the Main ProgramAdding IntelligenceUsing the Additional Information
Searching File Information
Creating the Index GeneratorCreating the IndexFile classRunning index.pySearching the Index
Building an Image Index
Creating Thumbnail ImagesCreating thumbnails on WindowsImplementing the SAXThumbs HandlerViewing Your Thumbnails
Converting XML to HTML
The Generated DocumentThe Conversion HandlerDriving the Conversion Handler
Advanced Parser Factory Usage
Native Parser Interfaces
Using PyExpat Directly
4. The Document Object Model
The DOM Specifications
Levels of the SpecificationFeature Specifications
Understanding the DOM
Python DOM Offerings
Streamlining with MinidomUsing Pulldom4DOM: A Full Implementation
Retrieving Information
Getting a Document ObjectLoading a document using 4DOMLoading a document using minidomDetermining a Node’s TypeGetting a Node’s ChildrenGetting a Node’s SiblingsExtracting Elements by NameExamining NodeList MembersLooking at Attributes
Changing Documents
Creating New NodesAdding and Moving NodesRemoving NodesChanging a Document’s Structure
Building a Web Application
Preparing the Web ServerEnsuring the script’s executionEnabling write permissionThe Web Application StructureThe Article classThe Storage classImplementing Site LogicThe ArticleManager classControlling the Application
Going Beyond SAX and DOM
5. Querying XML with XPath
XPath at a Glance
Where Is XPath Used?
Location Paths
An Example DocumentA Path Hosting ScriptGetting Character DataSpecifying an IndexTesting Descendent NodesTesting AttributesSelecting ElementsAdditional Operators
XPath Arithmetic Operators
XPath Functions
Working with NumbersWorking with StringsWorking with Nodes
Compiling XPath Expressions
6. Transforming XML with XSLT
The XSLT Specification
XSLT Processors
Defining Stylesheets
Simplified StylesheetsStandalone StylesheetsEmbedded Stylesheets
Using XSLT from the Command Line
XSLT Elements
The Stylesheet ElementCreating a Template ElementApplying TemplatesGetting the Value of a NodeIterating over Elements
A More Complex Example
File TemplateClass TemplateMethod Template
Embedding XSLT Transformations in Python
Creating the Source XMLCreating a Simple StylesheetCreating a Stylesheet with Edit FunctionsCreating the CGI ScriptSelecting a Mode
Choosing a Technique
7. XML Validation and Dialects
Working with DTDs
Validating with the Internal DTD SubsetValidating with an External DTD Subset
Validation at Runtime
The BillSummary Example
The Flat FileThe Web FormStarting the CGIConversion and ValidationConverting text to XMLValidating the XMLCreating a validation handlerCompleting the CGIDefining success and error functionsConverting the flat file to XMLValidating the converted XMLDisplaying the XMLRunning the Application in a Browser
Dialects, Frameworks, and Workflow
What Does ebXML Offer?
ebXML Document StructureBusiness Process and ModelingPhases of ebXML
8. Python Internet APIs
Connecting Web Sites
Continuing ImprovementPython to the Rescue
Working with URLs
Encoding URLsQuoting URLsUnquoting URLs
Opening URLs
Using FTPRetrieving URLs
Connecting with HTTP
HTTP ConversationsRequest TypesGetting a Document with PythonBuilding a Query String with httplibBaking Cookies for the ServerPerforming a POST OperationCreating a POST catcherEnsuring proper URL encodingPerforming a POST with httplibIllustrating a complete POST operation
Using the Server Classes
BaseHTTPServer Module ClassesServer Core ConceptsInstantiating a server classServing a GETServing a POSTBuilding a Complete ServerRunning a GET requestRunning a POST request
9. Python, Web Services, and SOAP
Python Web Services Support
The Emerging SOAP Standard
SOAP MessagesExchanging SOAP MessagesEncoding SOAP MessagesConstructing SOAP EnvelopesSOAP packet requirementsSOAP encoding styleUsing SOAP HeadersSOAP Body ElementsError Message and SOAP FaultFault elementFault codesSOAP Encoding TechniquesSOAP Encoding RulesSimple TypesCompound TypesSOAP over HTTPThe SOAPAction headerSOAP HTTP responsesSOAP for RPC
Python SOAP Options
Working with SOAPyWorking with MSSOAPMSSOAP Serialization BasicsAdding URIs and namespacesCreating the SOAP envelopeMaking the call
Example SOAP Server and Client
Requirements for Using MSSOAPGetting Microsoft SOAP Toolkit 2.0Making the samples web-visibleGetting Python COM supportFixing MSSOAP with makepy.pyServer SetupA Python SOAP ClientDefining reusable basics
What About XML-RPC?
10. Python and Distributed Systems Design
Sample Application and Flow Analysis
Decoupling Application SystemsRouting Adds FlexibilityRouting Adds Scalability
Understanding the Scope
Building the Database
Creating a Profiles DatabaseCreating a Customer TablePopulating the Database
Building the Profiles Access Class
The InterfacesGetting ProfilesConnecting with the databaseBuilding the XML documentReturning a DOM instead of a stringInserting and Deleting ProfilesInserting a profileDeleting a profileUpdating ProfilesThe Complete CustomerProfile Class
Creating an XML Data Store
A Large XML FileCreating an XML Access ObjectThe interfacesUsing the XMLOffer classCreating the XMLOffer classRetrieval methodsModification methods
The XML Switch
XML ArchitectureCore XML Switch ClassesThe XMLMessage ClassXMLMessage formatXMLMessage classXML message code architectureXMLMessage code listingThe XML Switch ServiceThe XML Switch ClientUsing postMsg.html to send back XMLUsing the XSC clientUsing the XSC APIThe XMLSwitchHandler Server ClassXMLSwitchHandler code architectureXMLSwitchHandler listing
Running the XML Switch
A Web Application
Connecting to a Web ServiceThe ComponentsThe TopologyThe Code ArchitectureThe CGI FunctionalityExtracting profile informationUpdating profile informationDisplaying all offersThe Complete sp.py ListingRunning the Site as a User
A. Installing Python and XML Tools
Installing Python
WindowsLinux and Unix
Installing PyXML
Installing 4Suite
B. XML Definitions
XML Definitions
C. Python SAX API
Convenience Functions
XMLReader
ContentHandler
DTDHandler
EntityResolver
InputSource
ErrorHandler
DeclHandler
LexicalHandler
Locator
SAX Exceptions
D. Python DOM API
DOMException
DOMException
DOMImplementation
DOMImplementation
DocumentFragment
DocumentFragment
Document
Document
Node
Node
NodeList
NodeList
NamedNodeMap
NamedNodeMap
CharacterData
CharacterData
Attr
Attr
Element
Element
Text
Text
Comment
Comment
CDATASection
CDATASection
DocumentType
DocumentType
Notation
Notation
Entity
Entity
EntityReference
EntityReference
ProcessingInstruction
ProcessingInstruction
4DOM Extensions
E. Working with MSXML3.0
Setting Up MSXML3.0
Basic DOM Operations
MSXML NodesUsing a NodeList
MSXML3.0 Support for XSLT
Source XMLXSL StylesheetRunning an MSXML Transformation
Handling Parsing Errors
MSXML3.0 Reference
MSXML3.0 Document Object
MSXML3.0 Node Object
MSXML3.0 NamedNodeMap Object
MSXML3.0 NodeList Object
MSXML3.0 ParseError Object
F. Additional Python XML Tools
Pyxie
Python XML Tools
XML Schema Validator
Sab-pyth
Redfoot
XML Components for Zope
Parsed XMLPage Templates
Online Resources
Index
Colophon
Copyright

Content preview from Python & XML

Working with URLs

The URL contains a great deal of Internet information in a single string. It tells you the name of the server, the name of the file on the server, any data that you are supplying to generate a dynamic response, and even the protocol to use to retrieve the information. In basic form, URLs look like this:

http://www.oreilly.com/oreilly/about.html

This URL has three elements. The first section tells you (or your software) the protocol in use for this resource. In this case, it is HTTP, shown by http:. The next section indicates the server name and its corresponding domain. In this case the server is named www, and the domain is oreilly.com, coming together as //www.oreilly.com. What follow are a pathname (/oreilly/) and a filename (about.html). Your browser uses this information as it comes to the brilliant conclusion to use HTTP in connecting with www in oreilly.com, and retrieves the /oreilly/about.html file.

Of course, URLs can become more complicated. If you type “Python” into a search box and click Submit, your browser may go after a URL similar to the following:

http://search.oreilly.com/cgi-bin/search?term=Python&category=All&pref=all

Now there are several more items to examine. First, the server has changed from www to search. Second, the path has changed from /oreilly/ to /cgi-bin/. The filename about.html has been replaced with a target named search. But most interesting is the question mark and the data that follows:

?term=Python&category=All&pref=all

This portion ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 0596001282Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Python & XML

by Christopher A. Jones, Fred L. Drake

Working with URLs

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

XML Processing with Python

Beginning Data Science with Python and Jupyter

Extending Power BI with Python and R

The Python Programming Bible: Networking, GUI, Email, XML, CGI

Publisher Resources