Cover | Table of Contents | Colophon
http://www.w3.org/TR/NOTE-xml-schema-req),
listed a wide variety of usage scenarios for schemas as well as for
the design principles that guided its creation.
<?xml version="1.0"?>
<library>
<book id="b0836217462" available="true">
<isbn>
0836217462
</isbn>
<title lang="en">
Being a Dog Is a Full-Time Job
</title>
<author id="CMS">
<name>
Charles M Schulz
</name>
<born>
1922-11-26
</born>
<dead>
2000-02-12
</dead>
</author>
<character id="PP">
<name>
Peppermint Patty
</name>
<born>
1966-08-22
</born>
<qualification>
bold, brash and tomboyish
</qualification>
</character>
<character id="Snoopy">
<name>
Snoopy
</name>
<born>
1950-10-04
</born>
<qualification>
extroverted beagle
</qualification>
</character>
<character id="Schroeder">
<name>
Schroeder
</name>
<born>
1951-05-30
</born>
<qualification>
brought classical music to the Peanuts strip
</qualification>
</character>
<character id="Lucy">
<name>
Lucy
</name>
<born>
1952-03-03
</born>
<qualification>
bossy, crabby and selfish
</qualification>
</character>
</book>
</library>
<?xml version="1.0"?>
<library>
<book id="b0836217462" available="true">
<isbn>
0836217462
</isbn>
<title lang="en">
Being a Dog Is a Full-Time Job
</title>
<author id="CMS">
<name>
Charles M Schulz
</name>
<born>
1922-11-26
</born>
<dead>
2000-02-12
</dead>
</author>
<character id="PP">
<name>
Peppermint Patty
</name>
<born>
1966-08-22
</born>
<qualification>
bold, brash and tomboyish
</qualification>
</character>
<character id="Snoopy">
<name>
Snoopy
</name>
<born>
1950-10-04
</born>
<qualification>
extroverted beagle
</qualification>
</character>
<character id="Schroeder">
<name>
Schroeder
</name>
<born>
1951-05-30
</born>
<qualification>
brought classical music to the Peanuts strip
</qualification>
</character>
<character id="Lucy">
<name>
Lucy
</name>
<born>
1952-03-03
</born>
<qualification>
bossy, crabby and selfish
</qualification>
</character>
</book>
</library>
author,
book, born,
character, dead,
isbn, library,
name, qualification, and
title, and the attributes are
available, id, and
lang.
schema), which belongs to
the W3C XML Schema namespace (http://www.w3.org/2001/XMLSchema) and is
usually prefixed as "xs."
name, born,
and title have simple content models:
.../...
<title lang="en">
Being a Dog Is a Full-Time Job
</title>
.../...
<name>
Charles M Schulz
</name>
<born>
1922-11-26
</born>
.../...library. This element was defined
in the earlier schema as:
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element ref="book" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
book element with the actual definition of this
element:
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element ref="isbn"/>
<xs:element ref="title"/>
<xs:element ref="author" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element ref="character" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="id"/>
<xs:attribute ref="available"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
book element is
contained inside the definition of the library
element, other definitions of book elements could
be done at other locations in the schema without any risk of
confusion—except maybe by human readers.
library. This element was defined
in the earlier schema as:
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element ref="book" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
book element with the actual definition of this
element:
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element ref="isbn"/>
<xs:element ref="title"/>
<xs:element ref="author" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element ref="character" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="id"/>
<xs:attribute ref="available"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
book element is
contained inside the definition of the library
element, other definitions of book elements could
be done at other locations in the schema without any risk of
confusion—except maybe by human readers.
book element cannot be reused elsewhere, and the
book element can no longer be a document element
any longer.
xs:complexType and
xs:group—we have sacrificed the modularity of our
first schema to gain the depth and structure of the second one. This
is a general tendency in W3C XML Schema.
name appears both
within author and character
with the same datatype. We may want to define the element
name with different content models in
author and character, as in
this instance document:
<?xml version="1.0"?>
<library>
<book id="b0836217462" available="true">
<isbn>
0836217462
</isbn>
<title lang="en">
Being a Dog Is a Full-Time Job
</title>
<author id="CMS">
<name>
<first>
Charles
</first>
<middle>
M.
</middle>
<last>
Schulz
</last>
</name>
<born>
1922-11-26
</born>
<dead>
2000-02-12
</dead>
</author>
<character id="Snoopy">
<name>
Snoopy
</name>
<born>
1950-10-04
</born>
<qualification>
extroverted beagle
</qualification>
</character>
</book>
</library>
name, we need to define at least one of the
name elements locally under its parent.
xs:string
and
xs:normalizedString).
xs:string
, since no
whitespace replacement is performed on the parsed value for this).
xs:string
, since no
whitespace replacement is performed on the parsed value for this, and
for
xs:normalizedString
, in which whitespaces are only normalized).
normalize-space(
), which corresponds with what W3C XML Schema calls
whitespace collapsing. It is also different from the DOM
normalize() method, which is a merge of adjacent
text objects.
xs:string
primitive datatype as well as other datatypes that have a similar
behavior (namely,
xs:hexBinary
,
xs:base64Binary
,
xs:anyURI
,
xs:QName
, and
xs:NOTATION
). These
types are not expected to carry any quantifiable value (W3C XML
Schema doesn't even expect to be able to sort them)
and their value space is identical to their lexical space except when
explicitly described otherwise. One should note that even though they
are grouped in this section because they have a similar behavior,
these primitive datatypes are considered quite different by the
Recommendation.
xs:string
and
xs:normalizedString
) are string
datatypes. One of the main differences between these types is the
applied whitespace processing. To stress this difference, we will
classify these types by their whitespace processing.
xs:string
xs:string
.
xs:decimal
for all the decimal types (including the
integer datatypes, considered decimals without a fractional part),
xs:double
and
xs:float
for
single and double precision floats, and
xs:boolean
for Booleans. Whitespaces are collapsed for all these datatypes.
xs:decimal
primary type
and constitute a set of predefined types that address the most common
usages.
xs:decimal
xs:decimal
value needs to be finite. Although
the number of digits is not limited, we will see in the next chapter
how the author of a schema can derive user-defined datatypes with a
limited number of digits if needed.
xs:decimal
xs:NMTOKENS
,
xs:IDREFS
, and
xs:ENTITIES
). For all
the list datatypes, the items must be separated by one or more
whitespaces.
xs:NMTOKENS
xs:NMTOKEN
. Each item
of the list must be in the lexical space of
xs:NMTOKEN
.
xs:IDREFS
xs:IDREF
. Each item
of the list must be in the lexical space of
xs:IDREF
and must reference an existing
xs:ID
in the same document.
xs:ENTITIES
xs:ENTITY
. Each item
of the list must be in the lexical space of
xs:ENTITY
and must match an unparsed entity defined in a
DTD.
anySimpleType. This
datatype is a kind of wildcard, which means, as expected, that any
value is accepted and doesn't add any constraint on
the lexical space.
anySimpleType has two other characteristics that
make it unique among simple types: users' simple
types cannot be derived from it and its properties, and its canonical
form is not defined in the Recommendation! These characteristics make
it a type that should be avoided, except when the rules of a
derivation (which we will see in the next chapter) require its usage.
<xs:element name="name" type="xs:string"/>
<xs:element name="qualification" type="xs:string"/>
<xs:element name="born" type="xs:date"/>
<xs:element name="dead" type="xs:date"/>
<xs:element name="isbn" type="xs:string"/>
<xs:attribute name="id" type="xs:ID"/>
<xs:attribute name="available" type="xs:boolean"/>
<xs:attribute name="lang" type="xs:language"/>
born and
dead are ISO 8601 dates. The ISBN number is
composed of numeric digits and a final character which can be either
a digit or the letter "x"-and is
therefore represented as a string. We also did a good job with the
datatypes for the id, available
and lang attributes, but the choice of
xs:string
for the elements name and
qualification is more controversial. They appear
in the instance document as:
<name>
Charles M Schulz
</name>
.../...
<qualification>
bold, brash and tomboyish
</qualification>
xs:token
instead of
xs:string
; the same applies to the
title element, which is a simple content derived
from
xs:string
that would be better derived from
xs:token
. This change will not have any impact on
the validation with our schema, but the document is more precisely
described and future derivations would be more easily built on
xs:token
than on
xs:string
. The
other datatype that could have been chosen better is
isbn, which can be represented as
xs:NMTOKEN. The new schema
would then be:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="name" type="xs:token"/>
<xs:element name="qualification" type="xs:token"/>
<xs:element name="born" type="xs:date"/>
<xs:element name="dead" type="xs:date"/>
<xs:element name="isbn" type="xs:NMTOKEN"/>
<xs:attribute name="id" type="xs:ID"/>
<xs:attribute name="available" type="xs:boolean"/>
<xs:attribute name="lang" type="xs:language"/>
<xs:element name="title">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:token">
<xs:attribute ref="lang"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element ref="book" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="author">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="born"/>
<xs:element ref="dead" minOccurs="0"/>
</xs:sequence>
<xs:attribute ref="id"/>
</xs:complexType>
</xs:element>
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element ref="isbn"/>
<xs:element ref="title"/>
<xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="character" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="id"/>
<xs:attribute ref="available"/>
</xs:complexType>
</xs:element>
<xs:element name="character">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="born"/>
<xs:element ref="qualification"/>
</xs:sequence>
<xs:attribute ref="id"/>
</xs:complexType>
</xs:element>
</xs:schema>
xs:complexType,
definitions (which we saw in our Russian doll design) and
xs:simpleType(global definition)
can be
either named or anonymous. Despite this similarity, simple and
complex types are very different. A simple type is a restriction on
the value of an element or an attribute (i.e., a constraint on the
content of a set of documents) while a complex type is a definition
of a content model (i.e., a constraint on the markup). This is why
the derivation methods for simple and complex types are very
different, even though W3C XML Schema used the same element name
(xs:restriction) for both. This is a common source of
confusion.
xs:positiveInteger
, which is a derivation by
restriction of
xs:integer
. The
restrictions can be defined along different aspects or axes that
W3C XML Schema calls
"facets."
xs:restriction
element and each facet is defined using a specific element embedded
in the xs:restriction element. The datatype on which
the restriction is applied is called the base datatype, which can be
referenced through a <base> attribute or
defined in the xs:restriction element:
<xs:simpleType name="myInteger">
<xs:restriction base="xs:integer">
<xs:minInclusive value="-2"/>
<xs:maxExclusive value="5"/>
</xs:restriction>
</xs:simpleType>
xs:simpleType(global definition)
anonymous definition:
<xs:simpleType name="myInteger">
<xs:restriction>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:maxExclusive value="5"/>
</xs:restriction>
</xs:simpleType>
<xs:minInclusive value="-2"/>
</xs:restriction>
</xs:simpleType>
xs:minInclusive
and
xs:maxExclusive
elements are two
facets that can be applied to an integer datatype. As can be guessed
from their names, they specify the minimum inclusive (i.e., that can
be reached) and maximum exclusive (i.e., that is not allowed) values.
We will introduce the list of facets in the next section. Depending
on the facet, each acts directly either on the value space or on the
lexical space of the datatype, and the same facet may have different
effects depending on the datatype on which it is applied.
<commaSeparated> 1, 2, 25 </commaSeparated> <valueWithUnit> 10 em </valueWithUnit>
<commaSeparated>
1 2 25
</commaSeparated>
<valueWithUnit unit="em">
10
</valueWithUnit>
<valueWithUnit>
10em
</valueWithUnit>
xs:list
element, which allows a definition
by reference to existing types or embeds a type definition (these two
syntaxes cannot be mixed).
xs:union
element, allowing a definition by
reference to existing types or by embedding type definition (these
two syntaxes can be mixed). The definition of a union datatype by
reference to existing types is done through a
memberType attribute containing a
whitespace-separated list of datatypes:
<xs:simpleType name="integerOrDate"> <xs:union memberTypes="xs:integer xs:date"/> </xs:simpleType>
<xs:simpleType> elements:
<xs:simpleType name="myIntegerUnion">
<xs:union>
<xs:simpleType>
<xs:restriction base="xs:integer"/>
</xs:simpleType>
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="undefined"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
<xs:simpleType name="myIntegerUnion">
<xs:union memberTypes="xs:integer">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="undefined"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
myIntegerUnion type to be either less than 100 or
undefined except by defining a pattern. To do so, we can create a
type derived by restriction from a built-in type to be less than 100,
and perform the union to allow the value to be
"undefined" afterward. The only two
facets that can be applied to a union datatype are
xs:pattern
and
xs:enumeration
.
xs:length
,
xs:maxLength
,
xs:minLength
,
xs:enumeration
, and
xs:whiteSpace
for derivation by list, and
xs:pattern
and
xs:enumeration
for derivation by union).
<xs:simpleType name="listOfUnions">
<xs:list>
<xs:simpleType>
<xs:union memberTypes="xs:date xs:integer"/>
</xs:simpleType>
</xs:list>
</xs:simpleType>
<xs:simpleType name="UnionOfLists">
<xs:union>
<xs:simpleType>
<xs:list itemType="xs:date"/>
</xs:simpleType>
<xs:simpleType>
<xs:list itemType="xs:integer"/>
</xs:simpleType>
</xs:union>
</xs:simpleType>
<UnionOfLists> 2001-01-01 2001-01-02 </UnionOfLists> <UnionOfLists> 1 2 3 </UnionOfLists> <ListOfUnions> 2001-01-01 2001-01-02 </ListOfUnions> <ListOfUnions> 1 2 3 </ListOfUnions> <ListOfUnions> 2001-01-01 1 2 </ListOfUnions>
<UnionOfLists> 2001-01-01 1 2 </UnionOfLists>
<xs:element name="name" type="xs:token"/>
<xs:element name="qualification" type="xs:token"/>
<xs:element name="born" type="xs:date"/>
<xs:element name="dead" type="xs:date"/>
<xs:element name="isbn" type="xs:NMTOKEN"/>
<xs:attribute name="id" type="xs:ID"/>
<xs:attribute name="available" type="xs:boolean"/>
<xs:attribute name="lang" type="xs:language"/>
<xs:simpleType name="string255">
<xs:restriction base="xs:token">
<xs:maxLength value="255"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="string32">
<xs:restriction base="xs:token">
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
xs:length
. This facet is a number of characters and acts
on the value space. This, therefore, does not eliminate instances
such as ABCDEFGHIJ, but this is probably the best
we can do for the moment:
<xs:simpleType name="isbn">
<xs:restriction base="xs:NMTOKEN">
<xs:length value="10"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="supportedLanguages">
<xs:restriction base="xs:language">
<xs:enumeration value="en"/>
<xs:enumeration value="es"/>
</xs:restriction>
</xs:simpleType>