BUY THIS BOOK
Add to Cart

Print Book $9.95


Add to Cart

Print+PDF $12.93

Add to Cart

PDF $7.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £6.95

What is this?

Looking to Reprint or License this content?


XML Pocket Reference
XML Pocket Reference, Third Edition By Simon St. Laurent, Michael Fitzgerald
August 2005
Pages: 175

Cover | Table of Contents


Table of Contents

Chapter 1: XML Pocket Reference
After several years of incredible hype, XML, the Extensible Markup Language, has settled down to become a respectable part of developers' toolboxes. XML's structured, text-based format has made it easy for programming languages and environments to support it, making XML the lingua franca of the data exchange world. XML wasn't the first way to do this, but it was the first that successfully attained approachable simplicity while representing complex data structures.
XML provides its users with tremendous flexibility. It defines a set of hierarchical structures for containing content, but leaves the details of those structures, including their names, to the people who create XML vocabularies. XML's common structures make it possible to create parsers and other toolkits that work on any legal XML out there, while still allowing customization of the data stored in those documents. Developers can do generic processing on XML documents as well as create applications that understand particular types of XML documents.
This reference covers the core of the XML standards for representing data, including the core structures of XML 1.0 and 1.1, namespaces, and schema languages for describing XML vocabularies. It doesn't cover tools for processing XML.
In this latest edition of the book, Extensible Stylesheet Language Transformations (XSLT) has been moved to a new, well-earned location in a separate O'Reilly book—the XSLT 1.0 Pocket Reference—to make room here for schema information.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction
After several years of incredible hype, XML, the Extensible Markup Language, has settled down to become a respectable part of developers' toolboxes. XML's structured, text-based format has made it easy for programming languages and environments to support it, making XML the lingua franca of the data exchange world. XML wasn't the first way to do this, but it was the first that successfully attained approachable simplicity while representing complex data structures.
XML provides its users with tremendous flexibility. It defines a set of hierarchical structures for containing content, but leaves the details of those structures, including their names, to the people who create XML vocabularies. XML's common structures make it possible to create parsers and other toolkits that work on any legal XML out there, while still allowing customization of the data stored in those documents. Developers can do generic processing on XML documents as well as create applications that understand particular types of XML documents.
This reference covers the core of the XML standards for representing data, including the core structures of XML 1.0 and 1.1, namespaces, and schema languages for describing XML vocabularies. It doesn't cover tools for processing XML.
In this latest edition of the book, Extensible Stylesheet Language Transformations (XSLT) has been moved to a new, well-earned location in a separate O'Reilly book—the XSLT 1.0 Pocket Reference—to make room here for schema information.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XML Structures
Everything in an XML document is text—typically Unicode text. Special characters (primarily < and >, but also &, '', and ') are used to separate the text that identifies document structures from the text contained in those structures. The text that represents the structure of the document is called markup, as historically it was extra information added to text documents to provide metadata, formatting, or other information. Adding this information to a document is referred to as "marking up" the document, although text and markup are usually created simultaneously now.
As each structure is discussed, applicable productions from the XML 1.0 and 1.1 specs will be listed in the order in which they appear in the specs. However, productions for Letter, BaseChar, IdeoGraphic, CombiningChar, Digit, and Extender are omitted here for the sake of brevity (refer to Appendix B in the 1.0 spec, at http://www.w3.org/TR/REC-xml/#CharClasses). If there are differences between the 1.0 and 1.1 productions, the line representing the production will be appended by either 1.0 or 1.1; otherwise, the productions in both specs are the same. Productions may be repeated for the reader's convenience.
You will find references to the XML specification in this section. Any reference preceded by a section symbol (§) is a reference to the XML spec. For example, §2.1 refers to Section 2.1 of the XML 1.0 and 1.1 specifications.
Elements, which are the building blocks of XML documents, are bounded by start-tags and end-tags that may hold content, or may consist of one empty-element tag.

Productions

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
1.0
[2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] 1.1
[3] S ::= (#x20 | #x9 | #xD | #xA)+
[4] NameChar::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Document Type Definitions
A document type definition, or DTD, defines the structure or content model of a valid XML instance.

Productions

[45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>'
[46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
[47] children ::= (choice | seq) ('?' | '*' | '+')?
[48] cp ::= (Name | choice | seq) ('?' | '*' | '+')?
[49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'
[50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'
[51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')'
[52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>'
[53] AttDef ::= S Name S AttType S DefaultDecl
[54] AttType ::= StringType | TokenizedType | EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID'| 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' |
'NMTOKENS'
[57] EnumeratedType ::= NotationType | Enumeration
[58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')'
[59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')'
[60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)
[61] conditionalSect ::= includeSect | ignoreSect
[62] includeSect ::= '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>'
[63] ignoreSect ::= '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>'
[64] ignoreSectContents ::= Ignore ('<![' ignoreSectContents ']]>' Ignore)*
[65].Ignore ::= Char* - (Char* ('<![' | ']]>') Char*)

Examples

<!ELEMENT message (#PCDATA)>
<!ATTLIST message date CDATA #REQUIRED>

<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet href="mine.css" type="text/css" ?>
<!--This is a very simple document.-->
<!DOCTYPE message [
 <!ELEMENT message (#PCDATA)>
 <!ATTLIST message date CDATA #REQUIRED>
]>
<message xmlns="http://simonstl.com/ns/examples/message"
         xml:lang="eng" date="20051006" >
   This is a message!
</message>

Description

XML inherited the DTD from SGML. The DTD is the native, grammar-based language for validating the structure of XML documents—though markup declarations are not specified in XML syntax—and is interwoven into the XML 1.0 and 1.1 specifications. A DTD can define elements, attributes, entities, and notations, and can contain comments (just like XML comments), conditional sections, and a structure unique to DTDs called
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
W3C XML Schema
XML Schema, sometimes abbreviated XSD or referred to as W3C XML Schema (WXS), is an XML vocabulary that enables you to describe other XML vocabularies so that programs can test whether a given document meets rules laid down in the schema. XML Schema is defined by a set of three W3C Recommendations:
XML Schema Part 0: Primer
A tutorial for XML Schema that explains Parts 1 and 2 in less detail and with more examples and integration; available at http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: Structures
An XML vocabulary for describing the structures of XML vocabularies; based on a mixture of markup and object-oriented design; available at http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes
A set of extensible types for describing the contents of XML elements and attributes, including things like integers, decimals, and dates; available at http://www.w3.org/TR/xmlschema-2/
The mechanisms for defining structures and datatypes both allow schema designers to create type systems that may be extended or restricted.
For more general information on XML Schema, see Eric van der Vlist's XML Schema (O'Reilly) or Priscilla Walmsley's Definitive XML Schema (Prentice-Hall). The Primer noted in the preceding list may also be a good place to start.
XML Schema 1.0, Second Edition, is the current version endorsed by the W3C, though work on XML Schema 1.1 has begun.
While all schemas use the same core parts, there are a number of different structural alternatives and key pieces worth examining before diving into all of the parts. Examine the structure of .
Example . A simple XML document for definition in a schema
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
RELAX NG
RELAX NG is a simple yet elegant schema language for XML. It was developed at OASIS under the leadership of James Clark and Murata Makoto and grew out of earlier efforts on the schema languages TREX (by Clark) and RELAX (by Murata). After becoming an OASIS committee specification in late 2001, RELAX NG was later standardized under ISO's Document Schema Definition Languages (DSDL) effort as ISO/IEC 19757-2.
RELAX NG is easy to learn, easy to use, and is supported by a broad variety of free tools. It can be expressed in XML syntax or in a compact, non-XML syntax. Its use is certainly not as widespread as W3C XML Schema, but RELAX NG continues to be a favorite among XML experts.
The RELAX NG XML-syntax tutorial is at http://relaxng.org/tutorial-20011203.html; the compact-syntax tutorial is at http://relaxng.org/compact-tutorial-20030326.html; and the specification is at http://relaxng.org/spec-20011203.html. For more information, see http://relaxng.org and http://dsdl.org. Eric van der Vlist's RELAX NG (O'Reilly) is also an excellent resource (an online version is available at http://books.xmlschemata.org/relaxng/). The following material is intended for quick reference on usage and syntax. For a complete, detailed reference, I recommend Chapters 17 and 18 of van der Vlist's RELAX NG.
The following RELAX NG reference is organized by XML element name; an associated compact syntax is provided in an example for each. The element names in headings are prefixed with rng: to distinguish them from XML Schema elements with identical names; however, the prefix is not normally necessary in common usage, nor is it used in provided examples.
The datatypeLibrary and ns attributes are legal on all elements, though in some instances they have no effect. datatypeLibrary names the datatype library to be used in the schema, and ns specifies the default namespace for either the element or attribute, depending on context. It is common to specify the W3C XML Schema datatype namespace (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Schematron
Schematron, which was developed by Rick Jelliffe, is a simple yet powerful schema language for XML and has recently become a an ISO standard candidate (ISO/IEC 19747-3; see http://www.dsdl.org). Schematron uses rule-based validation rather than the grammar-based validation used by XML Schema and RELAX NG, among others. It uses expressions written in XPath to precisely examine nodes in an instance, thus becoming, as Jelliffe puts it, the "feather duster" that can reach into corners where grammar-based languages cannot. Schematron is good at testing for co-occurrence constraints—that is, constraints based on the existence of a value or structure that in turn is based on the existence of another value or structure.
The most common version of Schematron is 1.5. You can obtain a reference implementation (an XSLT stylesheet) for Version 1.5 from http://xml.ascc.net/schematron/1.5/. A variety of Schematron validators are available from Topologi (http://www.topologi.com/). You can also get information on the new ISO Schematron and its implementations from http://www.schematron.com/.
An example of a Schematron 1.5 schema is shown in .
Example . horse.sch
1 <?xml version="1.0" encoding="US-ASCII"?>
2 <sch:schema xmlns:sch="http://www.ascc.net/xml/ schematron">
3  <sch:title>Horse schema</sch:title>
4  <sch:pattern>
5    <sch:rule context="horse">
6     <sch:assert test="@legs = '4'">Our horses should have
     4 legs.</sch:assert>
7     <sch:assert test="snip">Our horses should have a snip.
    </sch:report>
8     <sch:report test="blaze">This horse has a blaze.
    </sch:report>
9     <sch:report test="star">This horse has a star.
    </sch:report>
10    </sch:rule>
11   </sch:pattern>
12 </sch:schema>
Line 1 is simply an XML declaration. Line 2 is the root element of the schema that contains a namespace declaration. The namespace URI for Schematron 1.5 is http://www.ascc.net/xml/schematron; alternatively, the namespace is
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XML Specifications
The following list of XML-related specifications is by no means comprehensive but is provided as a quick reference to the URIs for prominent specs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Return to XML Pocket Reference