Chapter 1. Data Model

XSLT is a language for transforming XML documents. The input to an XSLT program (a “stylesheet”) is one or more XML documents. The output is another document, which may be XML, HTML, or text. XSLT operates on an abstraction of XML, called the XSLT data model (the XPath data model with some additions). XSLT is “closed” over this data model. In other words, its data model applies both to its input and its output. In fact, it even models the stylesheet, which is itself expressed in XML.

Tip

Unless explicitly followed by “2.0,” whenever this book speaks of “XSLT” or “XPath,” it is referring to the 1.0 versions of these languages.

Node Types

The XPath data model describes an XML document as a tree of nodes. There are seven types of nodes:

roottext
elementattribute
processing instructionnamespace
comment 

In the XPath 1.0 data model, all XML documents have a single root node, which is an invisible container for the entire document. The root node is not an element.

Tip

XPath 2.0 uses the term “document node” instead of “root node.” Regardless of what it’s called, don’t confuse it with the “root element” or “document element,” which is an element: a child of the root node, or document node.

There is one element node for each element, one attribute node for each attribute (excluding namespace declarations), one comment node for each comment, and one processing instruction node for each processing instruction (PI) that occurs in an XML document. A contiguous sequence of character data, after expanding all entities and CDATA sections, is modeled as a single text node. Finally, there is a namespace node attached to each element for each namespace/prefix binding that is in scope on that element. Each element has its own unique set of namespace nodes, which always includes at least one namespace node that corresponds to the implicit mapping between the prefix "xml" and the URI "http://www.w3.org/XML/1998/namespace" (reserved for attributes such as xml:lang and xml:space).

Tip

Thus, even for a document that does not explicitly use namespaces, there will be as many namespace nodes as there are elements.

Node Properties

Table 1-1 lists four node properties and their applicability for each type of node. These properties deal with a node’s relationship to other nodes. If a table cell is grayed out, that means the property is not applicable for that node type.

Table 1-1. Node relationship properties

Node type

Parent

Children

Attributes

Namespace nodes

Root

 

Ordered list of 0 or more elements, PIs, comments, and text nodes

  

Element

Element or root

"

Unordered list of 0 or more attribute nodes

Unordered list of 1 or more namespace nodes

PI

"

   

Comment

"

   

Text

"

   

Attribute

Element

   

Namespace

"

   

In the XPath language, to access a node’s parent, child nodes, attributes, or namespace nodes, use the corresponding axis: parent, child, attribute, or namespace. See the section Axes in Chapter 2.

Tip

Attributes and namespace nodes are not children. An element is considered to be the parent of an attribute or namespace node, but the attribute or namespace node is not considered to be the element’s child.

The descendants of a node consist of the node’s children, its children’s children, and so on.

All nodes, regardless of their type, have a string-value and a base URI. Some types of nodes have an expanded-name, which consists of two strings: a local part and a namespace URI. Element nodes have an optional unique ID. For each of the string-typed node properties, Table 1-2 lists the node types it applies to and how its value is determined. Once again, if a table cell is grayed out, that means the property is not applicable for that node type.

Table 1-2. String-typed node properties

Node type

String-value

Expanded-name (local/URI)

Base URI

Unique ID

Unparsed entity URIs

Root

Concatenation of descendant text nodes’ string-values, in document order

 

URI of the document entity

 

A set of mappings between declared entity names and their URIs

Element

"

Local:local name

URI:namespace name

URI of external entity; otherwise, base URI of root

Value of attribute declared as type ID in DTD (optional)

 

PI

Text following PI target and whitespace

Local:PI target

URI:null

"

  

Comment

Content of comment

 

Base URI of parent node

  

Text

Character data (at least one character)

 

"

  

Attribute

Normalized attribute value

Local:local name

URI:namespace name

"

  

Namespace

Namespace URI

Local:namespace prefix

URI:null

"

  

The XPath language provides functions for directly accessing most of these properties. To access the string-value of a node, use the string( ) function.

Tip

It’s not usually necessary to use string( ) explicitly, thanks to XPath’s automatic conversion of data types. See the Data Type Conversions section in Chapter 5.

To access the local and namespace URI parts of a node’s expanded-name, use the local-name( ) and namespace-uri( ) functions, respectively.

The base URI property is used for resolving relative URIs in a document, and it is used by XSLT’s document( ) function and the xsl:import and xsl:include elements. XSLT/XPath 1.0 does not provide a direct way to access the base URI property.

Tip

XPath 2.0, however, includes a function, base-uri( ), for directly accessing the base URI of a given node. It also uses the xml:base attribute to determine the base URI of a node (unlike XSLT 1.0).

The unique ID property is queried by the id( ) function to retrieve elements according to their ID value. There is no function to access the unique ID property directly, but that is not normally necessary, since you can easily access an element’s attribute values using the attribute axis.

Finally, use the unparsed-entity-uri( ) function to retrieve the URI of an unparsed entity with a given name.

All of XPath and XSLT’s built-in functions are described in Chapter 5.

Get XSLT 1.0 Pocket Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.