Neo, sooner or later you’re going to realize just as I did that there’s a difference between knowing the path and walking the path.
Morpheus (The Matrix)
XPath is an expression language that is fundamental to XML processing. You can no more master XSLT without mastering XPath than you can master English without learning the alphabet. Several readers of the first edition of XSLT Cookbook took me to task for not covering XPath. This chapter has been added partly to appease them but more so due to the greatly increased power of the latest XPath 2.0 specifications. However, many of these recipes are applicable to XPath 1.0 as well.
In XSLT 1.0, XPath plays three crucial roles. First, it is used within templates for addressing into the document to extract data as it is being transformed. Second, XPath syntax is used as a pattern language in the matching rules for templates. Third, it is used to perform simple math and string manipulations via built-in XPath operators and functions.
XSLT 2.0 retains and strengthens this intimate connection with XPath 2.0 by drawing heavily on the new computational abilities of XPath 2.0. In fact, one can make a reasonable argument that the enhanced capabilities of XSLT 2.0 stem largely from the advances in XPath 2.0. The new XPath 2.0 facilities include sequences, regular expressions, conditional and iterative expressions, and enhanced XML Schema compliant-type system as well as a large number of new built-in functions.
Each recipe in this chapter is a collection of mini-recipes for
solving certain classes of XPath problems that often arise while
using XSLT. We annotate each XPath expression with the XPath 2.0
(: comment :) but users of
XPath/XSLT 1.0 should be aware that these comments are not legal
syntax. When we are showing the result of an XPath evaluation that is
empty, we will write
(), which happens to be the
way one writes a literal empty sequence in XPath 2.0.
Each of the following solutions is organized around related sets of axes. For each group, a sample XML document is presented with the context node in bold. An explanation of the effect of evaluating the path is provided, along with an indication of the nodes that will be selected with respect to the highlighted context. In some cases, the solution will consider other nodes as the context to illustrate subtleties of the particular path expression.
is the default axis in XPath. This means one does not need to use the
child:: axis specification, but you can if you are
feeling pedantic. One can reach deeper into the XML tree using the
descendant:: and the
descendant-or-self:: axes. The former excludes the
context node and the latter includes it.
<Test id="descendants"> <parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> </Y> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/> </parent> </Test> (: Select all child elements named X :) X (: same as child::X :) Result: <X id="1"/> <X id="2"/> <X id="4"/> <X id="7"/><X id="8"/> (:Select the first X child element:) X Result: <X id="1"/> (:Select the last X child element:) X[last()] Result: <X id="8"/> (:Select the first element, provided it is an X. Otherwise empty:) *[self::X] Result: <X id="1"/> (:Select the last child, provided it is an X. Otherwise empty:) *[last()][self::X] Result: () *[last()][self::Y] Result: <Y id="9"/> (: Select all descendants named X :) descendant::X Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/> (: Select the context node, if it is an X, and all descendants named X :) descendant-or-self::X Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/> (: Select the context node and all descendant elements :) descendant-or-self::* Result: <parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> </Y> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/> </parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> </Y> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/>
The sibling axes include
following-sibling::. As the names suggest, the
preceding-sibling axis consists of siblings that precede the context
node and the following-sibling axis consists of siblings that follow
it. Siblings are, of course, child nodes that
share the same parent. Most of the examples below use
preceding-sibling::, but you should be able to
work out the results for
without too much trouble.
Keep in mind that when using a positional path expression of the form
preceding-sibling::*, you are referring to the
immediately preceding sibling looking back from the context node and
not the first sibling in document order. Some people get confused
because the resulting sequence is in document order regardless as to
whether you use
following-sibling::. Although not an axis
expression per say,
../X is a way of saying,
select both preceding and following siblings named X as well as the
context node, should it be an X. More formally speaking, it is an
parent::node()/X. Note that
will select the first preceding/following
sibling in document order.
<!-- Sample document with context node highlighted --> <Test id="preceding-siblings"> <A id="1"/> <A id="2"/> <B id="3"/> <A id="4"/> <B id="5"/> <C id="6"/> <A id="7"/> <A id="8"/> <B id="9"/> </Test> (:Select all A sibling elements that precede the context node. :) preceding-sibling::A Result: <A id="1"/> <A id="2"/> <A id="4"/> (:Select all A sibling elements that follow the context node. :) following-sibling::A Result: <A id="8"/> (:Select all sibling elements that precede the context node. :) preceding-sibling::* Result: <A id="1"/> <A id="2"/> <B id="3"/> <A id="4"/> <B id="5"/> <C id="6"/> (: Select the first preceding sibling element named A in reverse document order. :) preceding-sibling::A Result: <A id="4"/> (: The first preceding element in reverse document order, provided it is an A. :) preceding-sibling::*[self::A] Result: () (: If the context was <A id="8"/>, the result would be <A id="7"/> :) (:All preceding sibling elements that are not A elements:)
preceding-sibling::*[not(self::A)]Result <B id="3"/> <B id="5"/> <C id="6"/> (: For the following recipes use this document. :) <Test id="preceding-siblings"> <A id="1"> <A/> </A> <A id="2"/> <B id="3"> <A/> </B> <A id="4"/> <B id="5"/> <C id="6"/>
<A id="7"/><A id="8"/> <B id="9"/> </Test> (: The element directly preceding the context provided it has a child element A :)
preceding-sibling::*[A]Result: () The first element preceding the context that has a child A
preceding-sibling::*[A]Result: <B id="3"> ... (: XPath 2.0 allows more flexibility to select elements with respect to namespaces. For these recipes the following XML document applies. :) <Test xmlns:NS="http://www.ora.com/xstlcbk/1" xmlns:NS2="http://www.ora.com/xstlcbk/2"> <NS:A id="1"/> <NS2:A id="2"/>
<NS:B id="3"/><NS2:B id="3"/> </Test> (: Select the preceding sibling elemements of the context whose namespace is the namespace associated with prefix NS :)
preceding-sibling::NS:*Result: <NS:A id="1"/> (: Select the preceding sibling elemements of the context whose local name is A :)
preceding-sibling::*:AResult: <NS:A id="1"/>, <NS2:A id="2"/>
The parent axis (
parent::) refers to the
parent of the
context node. The expression
parent::X should not
be confused with
../X. The former will produce a
sequence of exactly one element provided the parent of the context is
X or empty otherwise. The latter is a shorthand for
parent::node()/X, which will select all siblings
of the context node named X, including the context itself, should it
be an X.
One can navigate to higher levels of
the XML tree (parents, grandparents,
great-grandparents, and so on) using either
ancestor-or-self::. The former excludes the
context and the latter includes it.
(: Select the parent of the context node, provided it is an X element. Empty otherwise. :) parent::X (: Select the parent element of the context node. Can only be empty if the context is the top-level element. :) parent::* (: Select the parent if it is in the namespace associated with the prefex NS. The prefix must be defined; otherwise, it is an error. :) parent::NS:* (: Select the parent, regardless of its namespace, provided the local name is X. :) parent::*:X (: Select all ancestor elements (including the parent) named X. :) ancestor::X (: Select the context, provided it is an X, and all ancestor elements named X. :) ancestor-or-self::X
The preceding and following axes have the potential to select a large number of nodes, because they consider all nodes that come before (after) the context node in document order excluding ancestor nodes. The following axis excludes descendants, and the preceding axis excludes ancestors. Also don’t forget: both axes exclude namespace nodes and attributes.
(: All preceding element nodes named X. :) preceding::X (: The closest preceding element node named X. :) preceding::X (: The furthest following element node named X. :) following::X[last()]
XPath uses the notion of an axis to partition the document tree into subsets relative to some node called the context node. In general, these subsets overlap, but the ancestor, descendant, following, preceding, and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap, and together they contain all the nodes in the document. The context node is established by the XPath hosting language. In XSLT, the context is set via:
a template match (
<xsl:template match="x"> ...
Effectively wielding the kinds of path expression shown in the
solution is key to performing both simple and complex
transformations. Experience with traditional programming languages
sometimes leads to confusion and mistakes when using XPath. For
example, I often used to catch myself writing something like
when I really intended
</xsl:if>. This is probably because the latter is a
less than intuitive way of saying “test if the
immediately preceding sibling is an X.”
It is, of course, impossible to show every useful permutation
of path expressions using axes. But if
you understand the building blocks presented previously you are well
on your way to decoding the
meaning of constructs such as
Many of the mini-recipes in Recipe 1.1 used
predicates to filter nodes, but those predicates were based strictly
on position of the node or node name. Here we consider a variety of
predicates that filter based on data content. In these examples, we
use a simple child element path
X before each
predicate, but one could equally substitute any path expression for
X, including those in Recipe 1.1.
In the following examples, we use the XPath
comparison operators (
eq, ne, lt, le, gt, and
ge) instead of the operators (
<=, >, and
>=). This is because when one is comparing
atomic values, the new operators are preferred. In XPath 1.0, you
only have the latter operators so make the appropriate substitution.
The new operators were introduced in XPath 2.0 because they have
simpler semantics and will probably be more efficient as a result.
The complexity of the old operators comes when one considers cases
where a sequence is on either side of the comparison. Recipe 1.8 covers this topic further.
Another point must be made for those working in XPath 2.0 because
that version incorporates type information when a schema is
available. That could lead to some of the expressions below
have type errors. For example,
X[@a = 10] is not
the same as
X[@a = '10'] when the attribute
a has an integer type. Here we assume there is no
schema and therefore all atomic values have the type
untypedAtomic. You can find more on this topic in
Recipes Recipe 1.9 and Recipe 1.10.
(: Select X child elements that have an attribute named a. :) X[@a] (: Select X children that have at least one attribute. :) X[@*] (: Select X children that have at least three attributes. :) X[count(@*) > 2] (: Select X children whose attributes sum to a value less than 7. :) X[sum(foreach $a in @* return number($a)) < 7] (: In XSLT 1.0 use sum(@*) < 7 :) (: Select X children that have no attributes named a. :) X[not(@a)] (: Select X children that have no attributes. :) X[not(@*)] (: Select X children that have an attribute named a with value '10'. :) X[@a eq '10'] (: Select X children that have a child named Z with value '10'. :) X[Z eq '10'] (: Select X children that have a child named Z with value not equal to '10'. :) X[Z ne '10'] (: Select X children if they have at least one child text node. :) X[text()] (: Select X children if they have a text node with at least one non-whitespace character. :) X[text()[normalize-space(.)]] (: Select X children if they have any child node. :) X[node()] (: Select X children if they contain a comment node. :) X[comment()] (: Select X children if they have an @a whose numerical value is less than 10. This expression will work equally well in XPath 1.0 and 2.0 regardles of whether @a is a string or a numeric type. :) X[number(@a) < 10] (: Select X if it has at least one preceding sibling named Z with an attribute y that is not equal to 10. :) X[preceding-sibling::Z/@y ne '10'] (: Select X children whose string-value consist of a single space character. :) X[. = ' '] (: An odd way of getting an empty sequence! :) X[false()] (: Same as X. :) X[true()] (: X elements with exactly 5 children elements. :) X[count(*) eq 5] (: X elements with exactly 5 children nodes (including element, text, comment, and PI nodes but not attribute nodes). :) X[count(node()) eq 5] (: X elements with exactly 5 nodes of any kind. :) X[count(@* | node()) eq 5] (: The first X child, provided it has the value 'some text'; empty otherwise. :) X[. eq 'some text'] (: Select all X children with the value 'some text' and return the first or empty if there is no such child. In simpler words, the first X child element that has the string-value 'some text'. :) X[. eq 'some text']
As with Recipe 1.1, it is
impossible to completely cover every interesting permutation of filtering
predicates. However, mastering the themes exemplified above should
help you develop almost
filtering expression you desire. Also consider that one can create
more complex conditions using the
or and the function
number(@a) > 5 and X[number(@a) < 10]
When using predicates with complex path expressions, you need to understand the effect of parenthesis.
(: Select the first Y child of every X child of the context node. This expression can result in a sequence of more than one Y. :) X/Y (: Select the sequence of nodes X/Y and then take the first. This expression can at most select one Y. :) (X/Y)
(: The empty node set :) /..
(: The empty sequence constructor. :) () (: Sequence consisting of the single atomic item 1. :) 1 (: Use the comma operator to construct a sequence. Here we build a sequence of all X children of the context, followed by Y children, followed by Z children. :) X, Y, Z (: Use the to operator to construct ranges. :) 1 to 10 (: Here we combine comma with several ranges. :) 1 to 10, 100 to 110, 17, 19, 23 (: Variables and functions can be used as well. :) 1 to $x 1 to count(para) (: Sequences do not nest so the following two sequences are the same. :) ((1,2,3), (4,5, (6,7), 8, 9, 10)) 1,2,3,4,5,6,7,8,9,10 (: The to operator cannot create a decreasing sequence directly. :) 10 to 1 (: This sequence is empty! :) (: You can accomplish the intended effect with the following. :) for $n in 1 to 10 return 11 - $n (: Remove duplicates from a sequence. :) distinct-values($seq) (: Return the size of a sequence. :) count($seq) (: Test if a sequence is empty. :) empty($seq) (: prefer over count($seq) eq 0 :) (: Locate the positions of an item in a sequence. Index-of produces a sequence of integers for every item in the first arg that is eq to the second. :) index-of($seq, $item) (: Extract subsequences. :) (: Up to 3 items from $seq, starting with the second. :) subsequence($seq, 2, 3) (: All items from $seq at position 3 to the end of the sequence. :) subsequence($seq, 3) (: Insert a sequence, $seq2, before the 3rd item in an input sequence, $seq1. :) insert-before($seq1, 3, $seq2) (: Construct a new sequence that contains all the items of $seq except the 3rd. :) remove($seq1, 3) (: If you need to remove several elements, you might consider an expression like the following. :) $seq1[not(position() = (1,3,5))] $seq1[position() gt 3 and position() lt 7]
In XPath 2.0, every data item (value) is a sequence. Thus, the atomic value 1 is just as much a sequence as the result of the expression (1 to 10). Another way of saying this is that every XPath 2.0 expression evaluates to a sequence. A sequence can contain from zero or more values, and these values can be nodes, atomic values, or mixtures of each. Order is significant when comparing sequences. You refer to the individual items of a sequence starting at position 1 (not 0, as someone with a C/Java background might expect).
XPath 1.0 does not have sequences but rather node sets. Node sets are not as tidy a concept as sequence, but in many cases, the distinction is irrelevant. For example, any XPath 1.0 expression that use the functions count() and empty() should behave the same in 2.0. The advantage of XPath 2.0 is that a sequence is a first class construct that can be explicitly constructed and manipulated using a variety of new XPath 2.0 functions. The recipes in this section introduce many important sequence idioms, and you will find many others sprinkled through the recipes of this book.
There are a few tricks you can play in XPath 1.0 to avoid using
simple situations. These tricks rely on the fact that false converts
to 0 and true to 1 when used in a mathematical context.
So, for example, min, max, and absolute value can be calculated directly in XPath 1.0. In these examples, assume $x and $y contain integers.
(: min :) ($x <= $y) * $x + ($y < $x) * $y (: max :) ($x >= $y) * $x + ($y > $x) * $y (: abs :) (1 - 2 * ($x < 0)) * $x
(: Default the value of a missing attribute to 10. :) if (@x) then @x else 10 (: Default the value of a missing element to 'unauthorized'. :) if (password) then password else 'unauthorized' (: Guard against division by zero. :) if ($d ne 0) then $x div $d else 0 (: A para elements text if it contains at least one non-whitespace character; otherwise, a single space. :) if (normalize-space(para)) then string(para) else ' '
If you are a veteran XSLT 1.0 programmer, you probably cringe every
time you need to add some conditional code to a template. I know I
do, and often go through pains to exploit XSLT’s
pattern-matching constructs to minimize conditional code. This is not
because such code is more complicated or inefficient in XSLT but
rather because it is so darn verbose. A simple
xsl:if is not that bad, but if you need to express
if-then-else logic, you are now forced to use the bulkier
In XSLT 2.0, there is an alternative but it is delivered in XPath 2.0
rather than XSLT 2.0 proper. On first exposure, one may get the
impression that XPath was somehow bastardized via the introduction of
what procedural programmers call flow of control statements. However,
once you begin to use XPath 2.0 in its full glory, you should quickly
conclude that both XPath and XSLT is bettered by these enhancements.
Further, the XPath 2.0 conditional expression does not deprecate the
xsl:if element but rather reduces the need to use
it in just those cases where it is most awkward. As an illustration,
compare the following snippets:
<!-- XSLT 1.0 --> <xsl:variable name="size"> <xsl:choose> <xsl:when test="$x > 3">big</xsl:when> <xsl:otherwise>small</xsl:when> </xsl:choose> </xsl:variable> <!-- XSLT 2.0 --> <xsl:variable name="size" select="if ($x gt 3) then 'big' else 'small' "/>
I think most readers will prefer the later XPath 2.0 solution over the former XSLT 1.0 one.
One important fact about the XPath conditional expression is that the
else is not optional. C programmers can appreciate this by comparing
it to the
a ? b : c expression in that language.
Often one will use the empty sequence
there is no other sensible value for the
of the expression.
Conditional expressions are useful for defaulting in the absence of a schema that provides defaults.
(: Defaulting the value of an optional attribute :) if (@optional) then @optional else 'some-default´ (: Defaulting the value of an optional element :) if (optional) then optional else 'some-default´
Handling undefined or undesirable results in expressions is also a good application. In this example we have an application specific reason to prefer 0 rather than number(`Infinity') as the result.
if ($divisor ne 0) then $dividend div $divisor else 0
You can also create conditions that are more complex. The following code that decodes an enumerated list
if (size eq 'XXL') then 50 else if (size eq 'XL') then 45 else if (size eq 'L') then 40 else if (size eq 'M') then 34 else if (size eq 'S') then 32 else if (size eq 'XS') then 29 else -1
However, in this case, you might find a solution using sequences to be cleaner especially if you replace the literal sequences with variables that might be initialized from an external XML file.
(50,45,40,34,32,29,-1)[(index-of((('XXL', 'XL', 'L', 'M', 'S', 'XS')), size), 7)]
Here we are assuming the context has only a single
size child element otherwise the expression is
illegal (but you can then write size instead). We are also relying
on the fact that
index-of returns an empty
sequence when the search item is not found which we concatenate with
7 to handle
Not applicable in 1.0. Use a recursive XSLT template.
(: Sum of squares. :) sum(for $x in $numbers return $x * $x) (: Average of squares. :) avg(for $x in $numbers return $x * $x)
(: Map a sequence of words in all paragraphs to a sequence of word lengths. :) for $x in //para/tokenize(., ' ') return string-length($x) (: Map a sequence of words in a paragraph to a sequence of word lengths for words greater than three letters. :) for $x in //para/tokenize(., ' ') return if (string-length($x) gt 3) the string-length($x) else () (: Same as above but with a condition on the input sequence. :) for $x in //para/tokenize(., ' ')[string-length(.) gt 3] return string-length($x)
(: Generate a sequence of squares of the first 100 integers. :) for $i in 1 to 100 return $i * $i (: Generate a sequence of squares in reverse order. :) for $i in 0 to 10 return (10 - $i) * (10 - $i)
(: Map a sequence of paragraphs to a duped sequence of paragraphs. :) for $x in //para return ($x, $x) (: Duplicate words. :) for $x in //para/tokenize(., ' ') return ($x, $x) (: Map words to word followed by word length. :) for $x in //para/tokenize(., ' ') return ($x, string-length($x))
As I indicated in Recipe 1.4, the addition
of control flow constructs into an expression language like XPath
might at first be perceived as odd or even misguided. You will
quickly overcome your doubts, however, when you experience the
liberating power of these XPath 2.0 constructs. This is especially
true for the XPath 2.0
The power of
for becomes most apparent when one
considers how it can be applied to reduce many complicated recursive
XSLT 1.0 solutions to just a single XPath 2.0 expression. Consider
the problem of computing sums in XSLT 1.0. If all you need is a
simple sum, there is no problem because the built-in XPath 1.0 sum
function will do fine. However, if you need to compute the sum of
squares, you are forced to write a larger, more awkward, and less
transparent recursive template. In fact, a good portion of the first
edition of this book was recipes for canned solutions to these
recursive gymnastics. With XPath 2.0, a sum of squares becomes
nothing more than
sum(for $x in $numbers return $x *
$numbers contains the sequence
of numbers we wish to sum over. Think of the trees that I could have
saved if this facility was in XPath 1.0!
for expression is hiding even more
power. You are not limited to just one iteration variable. Several
variables can be combined to create nested loops that create
sequences from interrelated nodes in a complex document.
(:Return a sequence consisting of para ids and the ids those para elements reference. :) for $s in /*/section, $p in $s/para, $r in $p/ref return ($p/@id, $r)
You should note that, other than being more compact, the preceding expression is not semantically different from the following:
for $s in /*/section return for $p in $s/para return for $r in $p/ref return ($p/@id, $r)
You should also note that there is no need to use a nested
for when the sequence you are producing is more
elegantly represent by a traditional path expression.
(: This use of for is just a long-winded way of writing /*/section/para/ref. :) for $s in /*/section, $p in $s/para, $r in $p/ref return $r
Sometimes you might want to know the position of each item in a
sequence as you process it. You cannot use the
position() function as you would in an
xsl:for-each because an XPath for expression does
not alter the context position. However, you can achieve the effect
for $pos in 1 to count($sequence), $item in $sequence[$pos] return $item , $pos
If the condition is based on equality, then the semantics of the
!= operators in XPath 1.0
and 2.0 will suffice.
(: True if at least one section is referenced. :) //section/@id = //ref/@idref (: True if all section elements are referenced by some ref element. :) count(//section) = count(//section[@id = //ref/@idref])
(: True if at least one section is referenced. :) some $id in //para/@id satisfies $id = //ref/@idref (: True if all section elements are referenced by some ref element. :) every $id in //section/@id satisfies $id = //ref/@idref
However, you can go quite a bit further with less effort in XPath 2.0.
(: There exists a section that references every section except itself. :) some $s in //section satisfies every $id in //section[@id ne $s/@id]/@id satisfies $id = $s/ref/@idref (: $sequence2 is a sub-sequence of $sequence1 :) count($sequence2) <= count($sequence1) and every $pos in 1 to count($sequence1), $item1 in $sequence1[$pos], $item2 in $sequence2[$pos] satisfies $item1 = $item2
If you remove the count check in the preceding expression, it would
assert that at least the first
$sequence2 are the same as corresponding
The semantics of =, !=, <, >, <=, >= in XPath 1.0 and 2.0
sometimes surprise the uninitiated when one of the operands is a
sequence or XPath 1.0 node set. This is because the operators
evaluate to true if there is at least one pair of values from each
side of the expression which compare according to the relation. In
XPath 1.0, this can sometimes work to your advantage, as we have
shown previously, but other times it can leave your head spinning and
you longing to be back in the 5th grade where math made sense. For
example, one would guess that
$x = $x should
always be true, but if
$x is the empty sequence,
it is not! This follows from the fact that you cannot find a pair of
items within each empty sequence that are equal.
(: union :) $set1 | $set2 (: intersection :) $set1[count(. | $set2) = count($set2)] (: difference :) $set1[count(. | $set2) != count($set2)]
In XPath 2.0, node sets are replaced by sequences. Unlike node sets, sequences are ordered and can contain duplicate items. However, when using the XPath 2.0 set operations, duplicates and ordering are ignored so sequences behave just like sets. The result of a set operation will never contain duplicates even if the inputs did.
(: All attributes except @a. :) @* except @a (: All attributes except @a and @b. :) @* except @a, @b
In, 1.0, one needs the following more awkward expressions:
@*[local-name(.) != 'a' and local-name(.) != 'b']
Interestingly enough, XPath only allows set operations over sequences of nodes. Atomic values are not allowed. This is because the set operations are over node identity and not value. One can get the effect of sets of values using the following XPath 2.0 expressions. For XPath 1.0, you will need to use XSLT recursion. See Chapter 8.
(: union :) distinct-values( ($items1, $items2) ) (: intersection :) distinct-values( $items1[. = $items2] ) (: difference :) distinct-values( $items1[not(. = $items2)] )
(: Test if $x and $y are the same exact node. :) generate-id($x) = generate-id($y) (: You can also take advantage of the the | operator's removal of duplicates. :) count($x|$y) = 1 (: Test if $x precedes $y in document order - note that this does not work if $x or $y are attributes. :) count($x/preceding::node()) < count($y/preceding::node()) or $x = $y/ancestor::node() (: Test if $x follows $y in document order - note that this does not work if $x or $y are attributes. :) count($x/following::node()) < count($y/following::node()) or $y = $x/ancestors::node()
The new XPath 2.0 node comparison operators are likely to be more
efficient and certainly easier to understand than the XPath 1.0
counterparts. However, if you are using XSLT 2.0, you will not find
too many situations where these operators are required. There are
many situations where you think you need << or >> when
xsl::for-each-group element is preferred. See
Recipe 6.2 for examples.
Most incompatibilities between XPath/XSLT 1.0 and 2.0 come from type errors. This is true regardless of whether a schema is present or not. You can eliminate many problems encountered in porting legacy XSLT 1.0 to XSLT 2.0 with respect to XPath differences by running in 1.0 compatibility mode.
<xsl:stylesheet version="1.0"> <!-- ... --> </xsl:stylesheet>
In my opinion, eventually you will want to stop using compatibility mode. XPath 2.0 provides several facilities for dealing with type conversions. First, you can use conversion functions explicitly.
(: Convert the first X child of the context to a number. :) number(X) + 17 (: Convert a number in $n to a string. :) concat("id-", string($n))
XPath 2.0 also provides type constructors so you can explicitly control the interpretation of a string.
(: Construct a date from a string. :) xs:date("2005-06-01") (: Construct doubles from strings. :) xs:double("1.1e8") + xs:double("23000")
Finally, XPath has the operators
cast as, and
as. Most of the time, you want to use the first
if ($x castable as xs:date) then $x cast as xs:date else xs:date("1970-01-01")
as, is not
a conversion per se but rather an assertion that tells the XPath
processor that you promise at runtime a value will conform to a
specified type. If this turns out not to be the case, then a type
error will occur. XPath 2.0 added
as so XPath implementers could perform static
(compile time) type checking in addition to dynamic type checking
while allowing programmers to selectively disable static type checks.
Static type checking XSLT 2.0 implementations will likely be rare so
you can ignore
as for the
time being. It is far more likely to arise in higher-end XQuery
processors that do static type checking to facilitate various
Running in 1.0 compatibility mode with an XSLT 2.0 processor does not mean you cannot use any of the new 2.0 features. It simply enables certain XPath 1.0 conversion rules.
It allows non-numeric types used in a context where numbers are
expected to convert automatically to numbers via atomization followed
by application of the
(: In compatability mode, the following evaluates to 18.1 but is a type error in 2.0. :) "1.1" + "17"
It allows non-string types used in a context where strings are
expected to convert automatically to strings via atomization followed
by application of the
(: In compatability mode, the following evaluates to 2 but is a type error in 2.0. :) string-length(1 + 2 + 3 + 4 + 5)
It automatically discards items from sequences of size two or more they are used in a context where a singleton is expected. This often happens when one is passing the result of a path expression to a function.
<poem> <line>There once was a programmer from Nantucket.</line> <line>Who liked his bits in a bucket.</line> <line>He said with a grin</line> <line>and drops of coffee on his chin,</line> <line>"If XSLT had a left-shift, I would love it!"</line> <poem> (: In compatability mode, both expressions evaluate to 43 but the first is a type error in 2.0. :) string-length(/poem/line) string-length(/poem/line)
If you validate your documents against a schema, the resulting nodes become annotated with type information. You can then test for these types in XPath 2.0 (and while matching templates in XSLT 2.0).
(: Test if all invoiceDate elements have been validated as dates. :) if (order/invoiceDate instance of element(*, xs:date)) then "invoicing complete" else " invoicing incomplete"
of is only
useful in the presence of schema validation. In addition,
it is not the same as
castable as. For instance,
xs:positiveInteger is always true but
xs:positiveInteger is never
true because literal integer types are labeled as
However, the benefit of validation is not simply the ability to test
instance of but rather from the safety
and convenience of knowing that there will be no type error surprises
once validation is passed. This can lead to more concise stylesheets.
(: Without validation, you should code like this. :) for $order in Order return xs:date($order/invoiceDate) - xs:date($order/createDate) (: If you know all date elements have been validated, you can dispense with the xs:date constructor. for $order in Order return $order/invoiceDate - $order/createDate
My personal preference is to use XML Schemas as specification documents and not validation tools. Therefore, I tend to write XSLT transformations in ways that are resilient to type errors and use explicit conversions where needed. Stylesheets written in this manner will work in the presence of validation or not.
Once you begin to write stylesheets that depend on validation, you are locked into implementations that perform validation. On the other hand, if your company standards say all XML documents will be schema-validated before processing, then you can simplify your XSLT based on assurances that certain data types will appear in certain situations.