book

XML Hacks

Name: XML Hacks
Author: Michael Fitzgerald
ISBN: 9780596007119

by Michael Fitzgerald

July 2004

Intermediate to advanced

479 pages

12h 30m

English

O'Reilly Media, Inc.

Read now

Unlock full access

XML Hacks
A Note Regarding Supplemental Files
Credits
Author
Contributors
Preface
Why XML Hacks?
How This Book Is Organized
Conventions Used in This Book
Using Code Examples

How to Contact Us
Got a hack?
Acknowledgments
1. Looking at XML Documents
Hacks #1-10
1. Read an XML Document
The XML DeclarationVersion informationThe encoding declarationThe standalone declarationCommentsElementsMixed contentAttributesCharacter referencesCDATA SectionsSee Also
2. Display an XML Document in a Web Browser
3. Apply Style to an XML Document with CSS
Processing InstructionsCascading Style SheetsApplying a Stylesheet to an XML DocumentSee Also
4. Use Character and Entity References
Character ReferencesThe xml:lang attributeEntity References
5. Examine XML Documents in Text Editors
VimEmacs with nXMLSee Also
6. Explore XML Documents in Graphical Editors
xmlspyxRay2<oXygen/>See Also
7. Choose Tools for Creating an XML Vocabulary
Well-Formedness, Validation, and SchemasDTDOther schema languagesNamespacesSee Also
8. Test XML Documents Online
RUWFRXPBrown University’s Validation Form
9. Test XML Documents from the Command Line
RXPxmlvalidxmllintxmlwf
10. Run Java Programs that Process XML
JAR FilesThe Java ClasspathUsing a JAR File as an Executable on Windows 2000 or XP
2. Creating XML Documents
Hacks #11-30
11. Edit XML Documents with <oXygen/>
12. Edit XML Documents with Emacs and nXML
Spotting Validity Errors in Real TimeGetting Help with nXMLUsing Context-Sensitive CompletionMaking nXML Work Your WayEntering and Displaying Special CharactersSee Also
13. Edit XML with Vim
Basic ConfigurationSyntax HighlightingIndentationFoldingAutomationSee Also
14. Edit XML Documents with Microsoft Word 2003
Attaching Schemas to WordUsing XSLT with Word 2003Saving Word 2003 Files as XMLSee Also
15. Work with XML in Microsoft Excel 2003
See Also
16. Work with XML in Microsoft Access 2003
See Also
17. Convert Microsoft Office Files, Old or New, to XML
DocBook
18. Create an XML Document from a Text File with xmlspy
See Also
19. Convert Text to XML with Uphill
Trying It OutHow the Code WorksThe markup classThe uphill classSummarySee Also
20. Create Well-Formed XML with Minimal Manual Tagging Using an SGML Parser
From HTML to XMLMarking Up the Names of PeopleSee Also
21. Create an XML Document from a CSV File
See Also
22. Convert an HTML Document to XHTML with HTML Tidy
23. Transform Documents with XQuery
See Also
24. Execute an XQuery with Saxon
Executing XQuery from a File Using SaxonPiping Queries to SaxonExecuting XQuery from Java Using XQJExecuting XQuery on the WebSee Also
25. Include Text and Documents with Entities
Unparsed Entities and Notations
26. Include External Documents with XInclude
See Also
27. Encode XML Documents
ISO/IEC 8859UTF-8 and UTF-16The Byte Order MarkSee Also
28. Explore XLink and XML
XML BaseXLinkOther XLink FunctionalityExtended linksXLink linkbasesSee Also
29. What’s the Diff? Diff XML Documents
DecisionSoft’s xmldiffDeltaXML’s XML ComparatorIBM’s XML Diff and Merge ToolSee Also
30. Look at XML Documents Through the Lens of the XML Information Set
3. Transforming XML Documents
Hacks #31-58
31. Understand the Anatomy of an XSLT Stylesheet
The Document ElementTemplatesUsing apply-templatesA literal result elementThe attribute value templateThe copy-of and copy elements
32. Transform an XML Document with a Command-Line Processor
SaxonInstant SaxonFull Java version of SaxonXalanMSXSL
33. Transform an XML Document Within a Graphical Editor
xmlspyxRay2<oXygen/>See Also
34. Analyze Nodes with TreeViewer
35. Explore a Document Tree with the xmllint Shell
xmllint Shell Commands
36. View Documents as Tables Using Generic CSS or XSLT
37. Generate an XSLT Identity Stylesheet with Relaxer
38. Pretty-Print XML Using a Generic Identity Stylesheet and Xalan
39. Create a Text File from an XML Document
Built-in Templates
40. Convert Attributes to Elements and Elements to Attributes
Element-to-Attribute ConversionAttribute-to-Element ConversionSee Also
41. Convert XML to CSV
See Also
42. Create and Process SpreadsheetML
43. Choose Your Output Format in XSLT
44. Transform Your iTunes Library File
45. Generate Multiple Output Documents with XSLT 2.0
46. Generate XML from MySQL
47. Generate PDF Documents from XML and CSS
48. Process XML Documents with XSL-FO and FOP
XSL-FO BasicsGenerating a PDFSee Also
49. Process HTML with XSLT Using TagSoup
Using TagSoup and TSaxon
50. Build Results with Literal Result and Instruction Elements
Literal Result Elements and Literal TextInstruction Elements
51. Write Push and Pull Stylesheets
52. Perform Math with XSLT
53. Transform XML Documents with grep and sed
grepsedSee Also
54. Generate SVG with XSLT
See Also
55. Dither Scatterplots with XSLT and SVG
56. Use Lookup Tables with XSLT to Translate FIPS Codes
The FIPS Code ExamplePutting the Lookup Table in the StylesheetRunning the Hack
57. Grouping in XSLT 1.0 and 2.0
Grouping with XSLT 1.0Grouping with XSLT 2.0See Also
58. Use EXSLT Extensions
EXSLT’s date:date( ), date:time( ), and math:lowest( ) FunctionsEXSLT’s exsl:node-set FunctionSee Also
4. XML Vocabularies
Hacks #59-67
59. Use XML Namespaces in an XML Vocabulary
See Also
60. Create an RDDL Document
See Also
61. Create and Validate an XHTML 1.0 Document
See Also
62. Create Books, Technical Manuals, and Papers in XML with DocBook
See Also
63. Create a SOAP 1.2 Document
See Also
64. Identify Yourself with FOAF
The FOAF VocabularyPersonal MetadataIdentifying MarksIt’s Who You KnowFiner-Grained RelationshipsImage Is EverythingPublishing FOAF DataSee Also
65. Unravel the OpenOffice File Format
See Also
66. Render Graphics with SVG
See Also
67. Use XForms in Your XML Documents
Anatomy of an XForms DocumentSimple Approaches to Trying Out XFormsA Working ExampleSee Also
5. Defining XML Vocabularies with Schema Languages
Hacks #68-79
68. Validate an XML Document with a DTD
External SubsetThe text declarationElement type declarations and content modelsAttribute-list declarationsInternal SubsetUsing an internal subset and an external subset togetherParameter EntitiesOther Things That Can Go in a DTDCommentsConditional sectionsUnparsed entities and notations
69. Validate an XML Document with XML Schema
A Quick Introduction to XML SchemaValidation with XML Schema ToolsXSD Schema ValidatorxmllintxsvOther XML Schema FeaturesSee Also
70. Validate Multiple Documents Against an XML Schema at Once
71. Check the Integrity of a W3C Schema
72. Validate an XML Document with RELAX NG
XML SyntaxxmllintJingA more complex RELAX NG schemaCompact SyntaxJing with compact syntaxRNVA more complex RELAX NG schema in compact syntaxSee Also
73. Create a DTD from an Instance
TrangRelaxerDTDGeneratorxmlspy
74. Create an XML Schema Document from an Instance or DTD
LuMriX.net’s DTD2XSMicrosoft XSD Inference 1.0TrangRelaxerxmlspy
75. Create a RELAX NG Schema from an Instance
Trang (XML Syntax)Relaxer (XML Syntax)Trang (Compact Syntax)
76. Convert a RELAX NG Schema to XML Schema
77. Use RELAX NG and Schematron Together to Validate Business Rules
Pulling Schematron Out of RELAX NGSee Also
78. Use RELAX NG to Generate DTD Customizations
Generating an RNC SchemaFlattening your DTDGenerating an RNC schema from your flattened DTDCreating an RNC Schema Customization FileCompiling Your Customization FileConverting Your RNC Customization File to RNG XML SyntaxUsing incelim.xsl to Compile Your RNG Customization FileGenerating Your DTD Subset
79. Generate Instances Based on Schemas
Generating an Instance with xmlspyGenerating an Instance with the Sun Instance GeneratorSee Also
6. RSS and Atom
Hacks #80-90
80. Subscribe to RSS Feeds
Radio UserLandAmphetaDeskNewsGatorSee Also
81. Create an RSS 0.91 Document
82. Create an RSS 1.0 Document
See Also
83. Create an RSS 2.0 Document
See Also
84. Create an Atom Document
Feed EntriesSee Also
85. Validate RSS and Atom Documents
See Also
86. Create RSS with XML::RSS
See Also
87. Syndicate Content with Movable Type
Syndicating the Whole PostIncluding Trackback LinksCreating Specialized FeedsCategory FeedsSyndicating Comments
88. Post RSS Headlines on Your Site
The CodeRunning the Hack
89. Create RSS 0.91 Feeds from Google
90. Syndicate a List of Books from Amazon with RSS and ASP
What You NeedThe CodeRunning the Hack
7. Advanced XML Hacks
Hacks #91-100
91. Pipeline XML with Ant
Validating an XML DocumentThe Jing TaskAn XML Pipeline ExampleSee Also
92. Use Elements Instead of Entities to Avoid the “amp Explosion Problem”
See Also
93. Use Cocoon to Create a Well-Formed View of a Web Page, Then Scrape It for Data
Cocoon in 60 SecondsRunning the HackExtending the Hack
94. From Wiki to XML, Through SGML
SGML: A Language for Describing WikisAn SGML Document Type for WikiWhich Wiki?Wiki as SGML
95. Create Well-Formed XML with JavaScript
The Element FunctionAdding AttributesExtending the HackCreating Large Chunks of XMLSee Also
96. Inspect and Edit XML Documents with the Document Object Model
DOM InspectorPython’s minidomDOM in JavaSee Also
97. Processing XML with SAX
A Little Help from SAXSee Also
98. Process XML with C#
Getting C#Writing an XML Document with XmlTextWriterReading XMLSee Also
99. Generate Code from XML
Using Relaxer to Generate JavaUsing xmlspy to Generate C#See Also
100. Create Well-Formed XML with Genx
Setting Up GenxCompiling GenxA First ExampleDeclare Markup for Better Performance
Index
About the Author
Colophon
Copyright

Content preview from XML Hacks

Process HTML with XSLT Using TagSoup

Use TSaxon, a variant of Saxon, and TagSoup to help transform HTML.

Stylesheets written in XSLT are the standard method of taking XML documents in one format and transforming them into HTML, XML documents in a different format, XHTML, or plain-text documents.

There are many XSLT processors. Michael Kay’s Saxon Version 6.5.3 (http://saxon.sourceforge.net/#F6.5.3) is a particularly mature and successful implementation for XSLT 1.0 and XPath 1.0. It is packaged as a Java JAR file called saxon.jar. You can download this JAR with the 6.5.3 distribution from the Saxon site on Sourceforge.

Now suppose, for example, that we want to extract just the header elements (h1, h2, h3, etc.) from an XHTML document and display them as progressively indented plain text (i.e., each h1 element is unindented, each h2 element is indented by a single space, h2 by two spaces, etc.).

The XSLT stylesheet outline.xsl does exactly what we want. It specifies an output method of text, and matches the h1 through h6 elements in the XHTML input, taking the content of each one and prepending the correct number of spaces. The textual content of other elements is suppressed.

The following command, executed in your working directory, will process outline.xsl and the XHTML document outline.xhtml using Saxon and will display the resulting indented plain text:

java -jar saxon.jar outline.xhtml outline.xsl

It so happens that outline.html contains only h1, h2, and h3 elements (borrowed ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 0596007116Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

XML Hacks

by Michael Fitzgerald

Process HTML with XSLT Using TagSoup

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

.NET & XML

Beginning XSLT and XPath: Transforming XML Documents and Data

Effective XML: 50 Specific Ways to Improve Your XML

XML Pocket Reference, 3rd Edition

Publisher Resources

Process HTML with XSLT Using TagSoup

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

More than 5,000 organizations count on O’Reilly

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.