XSLT is a language for transforming XML documents. The input to an XSLT program (a “stylesheet”) is one or more XML documents. The output is another document, which may be XML, HTML, or text. XSLT operates on an abstraction of XML, called the XSLT data model (the XPath data model with some additions). XSLT is “closed” over this data model. In other words, its data model applies both to its input and its output. In fact, it even models the stylesheet, which is itself expressed in XML.
Tip
Unless explicitly followed by “2.0,” whenever this book speaks of “XSLT” or “XPath,” it is referring to the 1.0 versions of these languages.
The XPath data model describes an XML document as a tree of nodes. There are seven types of nodes:
root | text |
element | attribute |
processing instruction | namespace |
comment |
In the XPath 1.0 data model, all XML documents have a single root node, which is an invisible container for the entire document. The root node is not an element.
Tip
XPath 2.0 uses the term “document node” instead of “root node.” Regardless of what it’s called, don’t confuse it with the “root element” or “document element,” which is an element: a child of the root node, or document node.
There is one element node for each element, one attribute node for each attribute (excluding namespace declarations), one comment node for each comment, and one processing instruction node for each processing instruction (PI) that occurs in an XML document. A contiguous sequence of character data, after expanding all entities and CDATA sections, is modeled as a single text node. Finally, there is a namespace node attached to each element for each namespace/prefix binding that is in scope on that element. Each element has its own unique set of namespace nodes, which always includes at least one namespace node that corresponds to the implicit mapping between the prefix "xml
" and the URI "http://www.w3.org/XML/1998/namespace" (reserved for attributes such as xml:lang
and xml:space
).
Tip
Thus, even for a document that does not explicitly use namespaces, there will be as many namespace nodes as there are elements.
Table 1-1 lists four node properties and their applicability for each type of node. These properties deal with a node’s relationship to other nodes. If a table cell is grayed out, that means the property is not applicable for that node type.
Table 1-1. Node relationship properties
Node type | Parent | Children | Attributes | Namespace nodes |
---|---|---|---|---|
Root | Ordered list of 0 or more elements, PIs, comments, and text nodes | |||
Element | Element or root | " | Unordered list of 0 or more attribute nodes | Unordered list of 1 or more namespace nodes |
PI | " | |||
Comment | " | |||
Text | " | |||
Attribute | Element | |||
Namespace | " |
In the XPath language, to access a node’s parent, child nodes, attributes, or namespace nodes, use the corresponding axis: parent
, child
, attribute
, or namespace
. See the section Axes in Chapter 2.
Tip
Attributes and namespace nodes are not children. An element is considered to be the parent of an attribute or namespace node, but the attribute or namespace node is not considered to be the element’s child.
The descendants of a node consist of the node’s children, its children’s children, and so on.
All nodes, regardless of their type, have a string-value and a base URI. Some types of nodes have an expanded-name, which consists of two strings: a local part and a namespace URI. Element nodes have an optional unique ID. For each of the string-typed node properties, Table 1-2 lists the node types it applies to and how its value is determined. Once again, if a table cell is grayed out, that means the property is not applicable for that node type.
Table 1-2. String-typed node properties
Node type | String-value | Expanded-name (local/URI) | Base URI | Unique ID | Unparsed entity URIs |
---|---|---|---|---|---|
Root | Concatenation of descendant text nodes’ string-values, in document order | URI of the document entity | A set of mappings between declared entity names and their URIs | ||
Element | " | Local:local name URI:namespace name | URI of external entity; otherwise, base URI of root | Value of attribute declared as type | |
PI | Text following PI target and whitespace | Local:PI target URI:null | " | ||
Comment | Content of comment | Base URI of parent node | |||
Text | Character data (at least one character) | " | |||
Attribute | Normalized attribute value | Local:local name URI:namespace name | " | ||
Namespace | Namespace URI | Local:namespace prefix URI:null | " |
The XPath language provides functions for directly accessing most of these properties. To access the string-value of a node, use the string( )
function.
Tip
It’s not usually necessary to use string( )
explicitly, thanks to XPath’s automatic conversion of data types. See the Data Type Conversions section in Chapter 5.
To access the local and namespace URI parts of a node’s expanded-name, use the local-name( )
and namespace-uri( )
functions, respectively.
The base URI property is used for resolving relative URIs in a document, and it is used by XSLT’s document( )
function and the xsl:import
and xsl:include
elements. XSLT/XPath 1.0 does not provide a direct way to access the base URI property.
Tip
XPath 2.0, however, includes a function, base-uri( )
, for directly accessing the base URI of a given node. It also uses the xml:base
attribute to determine the base URI of a node (unlike XSLT 1.0).
The unique ID property is queried by the id( )
function to retrieve elements according to their ID value. There is no function to access the unique ID property directly, but that is not normally necessary, since you can easily access an element’s attribute values using the attribute axis.
Finally, use the unparsed-entity-uri( )
function to retrieve the URI of an unparsed entity with a given name.
All of XPath and XSLT’s built-in functions are described in Chapter 5.
Get XSLT 1.0 Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.