Chapter 1. XPath
Neo, sooner or later you’re going to realize just as I did that there’s a difference between knowing the path and walking the path.
Morpheus (The Matrix)
Introduction
XPath is an expression language that is fundamental to XML processing. You can no more master XSLT without mastering XPath than you can master English without learning the alphabet. Several readers of the first edition of XSLT Cookbook took me to task for not covering XPath. This chapter has been added partly to appease them but more so due to the greatly increased power of the latest XPath 2.0 specifications. However, many of these recipes are applicable to XPath 1.0 as well.
In XSLT 1.0, XPath plays three crucial roles. First, it is used within templates for addressing into the document to extract data as it is being transformed. Second, XPath syntax is used as a pattern language in the matching rules for templates. Third, it is used to perform simple math and string manipulations via built-in XPath operators and functions.
XSLT 2.0 retains and strengthens this intimate connection with XPath 2.0 by drawing heavily on the new computational abilities of XPath 2.0. In fact, one can make a reasonable argument that the enhanced capabilities of XSLT 2.0 stem largely from the advances in XPath 2.0. The new XPath 2.0 facilities include sequences, regular expressions, conditional and iterative expressions, and enhanced XML Schema compliant-type system as well as a large number of new built-in functions.
Each recipe in this chapter is a collection of mini-recipes for
solving certain classes of XPath problems that often arise while
using XSLT. We annotate each XPath expression with the XPath 2.0
commenting convention (: comment :)
but users of
XPath/XSLT 1.0 should be aware that these comments are not legal
syntax. When we are showing the result of an XPath evaluation that is
empty, we will write ()
, which happens to be the
way one writes a literal empty sequence in XPath 2.0.
1.1. Effectively Using Axes
Problem
You need to select nodes in an XML tree in ways that consider complex relationships within the hierarchical structure.
Solution
Each of the following solutions is organized around related sets of axes. For each group, a sample XML document is presented with the context node in bold. An explanation of the effect of evaluating the path is provided, along with an indication of the nodes that will be selected with respect to the highlighted context. In some cases, the solution will consider other nodes as the context to illustrate subtleties of the particular path expression.
Child and descendant axes
The
child axis
is the default axis in XPath. This means one does not need to use the
child:
: axis specification, but you can if you are
feeling pedantic. One can reach deeper into the XML tree using the
descendant:
: and the
descendant-or-self:
: axes. The former excludes the
context node and the latter includes it.
<Test id="descendants"> <parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> </Y> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/> </parent> </Test> (: Select all child elements named X :) X (: same as child::X :) Result: <X id="1"/> <X id="2"/> <X id="4"/> <X id="7"/><X id="8"/> (:Select the first X child element:) X[1] Result: <X id="1"/> (:Select the last X child element:) X[last()] Result: <X id="8"/> (:Select the first element, provided it is an X. Otherwise empty:) *[1][self::X] Result: <X id="1"/> (:Select the last child, provided it is an X. Otherwise empty:) *[last()][self::X] Result: () *[last()][self::Y] Result: <Y id="9"/> (: Select all descendants named X :) descendant::X Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/> (: Select the context node, if it is an X, and all descendants named X :) descendant-or-self::X Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/> (: Select the context node and all descendant elements :) descendant-or-self::* Result: <parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> </Y> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/> </parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> </Y> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/>
Sibling axes
The sibling axes include
preceding-sibling:
: and
following-sibling:
:. As the names suggest, the
preceding-sibling axis consists of siblings that precede the context
node and the following-sibling axis consists of siblings that follow
it. Siblings are, of course, child nodes that
share the same parent. Most of the examples below use
preceding-sibling:
:, but you should be able to
work out the results for following-sibling:
:
without too much trouble.
Keep in mind that when using a positional path expression of the form
preceding-sibling::*[1]
, you are referring to the
immediately preceding sibling looking back from the context node and
not the first sibling in document order. Some people get confused
because the resulting sequence is in document order regardless as to
whether you use preceding-sibling:
: or
following-sibling:
:. Although not an axis
expression per say, ../X
is a way of saying,
select both preceding and following siblings named X as well as the
context node, should it be an X. More formally speaking, it is an
abbreviation for parent::node()/X
. Note that
(preceding-sibling::*)[1]
and
(following-sibling::*)[1]
will select the first preceding/following
sibling in document order.
<!-- Sample document with context node highlighted --> <Test id="preceding-siblings"> <A id="1"/> <A id="2"/> <B id="3"/> <A id="4"/> <B id="5"/> <C id="6"/> <A id="7"/> <A id="8"/> <B id="9"/> </Test> (:Select all A sibling elements that precede the context node. :) preceding-sibling::A Result: <A id="1"/> <A id="2"/> <A id="4"/> (:Select all A sibling elements that follow the context node. :) following-sibling::A Result: <A id="8"/> (:Select all sibling elements that precede the context node. :) preceding-sibling::* Result: <A id="1"/> <A id="2"/> <B id="3"/> <A id="4"/> <B id="5"/> <C id="6"/> (: Select the first preceding sibling element named A in reverse document order. :) preceding-sibling::A[1] Result: <A id="4"/> (: The first preceding element in reverse document order, provided it is an A. :) preceding-sibling::*[1][self::A] Result: () (: If the context was <A id="8"/>, the result would be <A id="7"/> :) (:All preceding sibling elements that are not A elements:)preceding-sibling::*[not(self::A)]
Result <B id="3"/> <B id="5"/> <C id="6"/> (: For the following recipes use this document. :) <Test id="preceding-siblings"> <A id="1"> <A/> </A> <A id="2"/> <B id="3"> <A/> </B> <A id="4"/> <B id="5"/> <C id="6"/><A id="7"/>
<A id="8"/> <B id="9"/> </Test> (: The element directly preceding the context provided it has a child element A :)preceding-sibling::*[1][A]
Result: () The first element preceding the context that has a child Apreceding-sibling::*[A][1]
Result: <B id="3"> ... (: XPath 2.0 allows more flexibility to select elements with respect to namespaces. For these recipes the following XML document applies. :) <Test xmlns:NS="http://www.ora.com/xstlcbk/1" xmlns:NS2="http://www.ora.com/xstlcbk/2"> <NS:A id="1"/> <NS2:A id="2"/><NS:B id="3"/>
<NS2:B id="3"/> </Test> (: Select the preceding sibling elemements of the context whose namespace is the namespace associated with prefix NS :)preceding-sibling::NS:*
Result: <NS:A id="1"/> (: Select the preceding sibling elemements of the context whose local name is A :)preceding-sibling::*:A
Result: <NS:A id="1"/>, <NS2:A id="2"/>
Parent and ancestor axes
The parent axis (parent:
:) refers to the
parent of the
context node. The expression parent::X
should not
be confused with ../X
. The former will produce a
sequence of exactly one element provided the parent of the context is
X or empty otherwise. The latter is a shorthand for
parent::node()/X
, which will select all siblings
of the context node named X, including the context itself, should it
be an X.
One can navigate to higher levels of
the XML tree (parents, grandparents,
great-grandparents, and so on) using either
ancestor:
: or
ancestor-or-self:
:. The former excludes the
context and the latter includes it.
(: Select the parent of the context node, provided it is an X element. Empty otherwise. :) parent::X (: Select the parent element of the context node. Can only be empty if the context is the top-level element. :) parent::* (: Select the parent if it is in the namespace associated with the prefex NS. The prefix must be defined; otherwise, it is an error. :) parent::NS:* (: Select the parent, regardless of its namespace, provided the local name is X. :) parent::*:X (: Select all ancestor elements (including the parent) named X. :) ancestor::X (: Select the context, provided it is an X, and all ancestor elements named X. :) ancestor-or-self::X
Preceding and following axes
The preceding and following axes have the potential to select a large number of nodes, because they consider all nodes that come before (after) the context node in document order excluding ancestor nodes. The following axis excludes descendants, and the preceding axis excludes ancestors. Also don’t forget: both axes exclude namespace nodes and attributes.
(: All preceding element nodes named X. :) preceding::X (: The closest preceding element node named X. :) preceding::X[1] (: The furthest following element node named X. :) following::X[last()]
Discussion
XPath uses the notion of an axis to partition the document tree into subsets relative to some node called the context node. In general, these subsets overlap, but the ancestor, descendant, following, preceding, and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap, and together they contain all the nodes in the document. The context node is established by the XPath hosting language. In XSLT, the context is set via:
a template match (
<xsl:template match="x"> ... </xsl:template>
)xsl:for-each
xsl:apply-templates
Effectively wielding the kinds of path expression shown in the
solution is key to performing both simple and complex
transformations. Experience with traditional programming languages
sometimes leads to confusion and mistakes when using XPath. For
example, I often used to catch myself writing something like
<xsl:if
test="preceding-sibling::X[1]"> </xsl:if>
when I really intended <xsl:if
test="preceding-sibling::*[1][self::X]">
</xsl:if>
. This is probably because the latter is a
less than intuitive way of saying “test if the
immediately preceding sibling is an X.”
It is, of course, impossible to show every useful permutation
of path expressions using axes. But if
you understand the building blocks presented previously you are well
on your way to decoding the
meaning of constructs such as
preceding-sibling::X[1]/descendant::Z[A/B]
or
worse.
1.2. Filtering Nodes
Problem
You need to select nodes based on the data they contain instead or in addition to their names or position.
Solution
Many of the mini-recipes in Recipe 1.1 used
predicates to filter nodes, but those predicates were based strictly
on position of the node or node name. Here we consider a variety of
predicates that filter based on data content. In these examples, we
use a simple child element path X
before each
predicate, but one could equally substitute any path expression for
X
, including those in Recipe 1.1.
Tip
In the following examples, we use the XPath
2.0
comparison operators (eq, ne, lt, le, gt
, and
ge
) instead of the operators (=, !=,
<
, <=, >
, and
>=
). This is because when one is comparing
atomic values, the new operators are preferred. In XPath 1.0, you
only have the latter operators so make the appropriate substitution.
The new operators were introduced in XPath 2.0 because they have
simpler semantics and will probably be more efficient as a result.
The complexity of the old operators comes when one considers cases
where a sequence is on either side of the comparison. Recipe 1.8 covers this topic further.
Another point must be made for those working in XPath 2.0 because
that version incorporates type information when a schema is
available. That could lead to some of the expressions below
to
have type errors. For example, X[@a = 10]
is not
the same as X[@a = '10']
when the attribute
a
has an integer type. Here we assume there is no
schema and therefore all atomic values have the type
untypedAtomic
. You can find more on this topic in
Recipes Recipe 1.9 and Recipe 1.10.
(: Select X child elements that have an attribute named a. :) X[@a] (: Select X children that have at least one attribute. :) X[@*] (: Select X children that have at least three attributes. :) X[count(@*) > 2] (: Select X children whose attributes sum to a value less than 7. :) X[sum(foreach $a in @* return number($a)) < 7] (: In XSLT 1.0 use sum(@*) < 7 :) (: Select X children that have no attributes named a. :) X[not(@a)] (: Select X children that have no attributes. :) X[not(@*)] (: Select X children that have an attribute named a with value '10'. :) X[@a eq '10'] (: Select X children that have a child named Z with value '10'. :) X[Z eq '10'] (: Select X children that have a child named Z with value not equal to '10'. :) X[Z ne '10'] (: Select X children if they have at least one child text node. :) X[text()] (: Select X children if they have a text node with at least one non-whitespace character. :) X[text()[normalize-space(.)]] (: Select X children if they have any child node. :) X[node()] (: Select X children if they contain a comment node. :) X[comment()] (: Select X children if they have an @a whose numerical value is less than 10. This expression will work equally well in XPath 1.0 and 2.0 regardles of whether @a is a string or a numeric type. :) X[number(@a) < 10] (: Select X if it has at least one preceding sibling named Z with an attribute y that is not equal to 10. :) X[preceding-sibling::Z/@y ne '10'] (: Select X children whose string-value consist of a single space character. :) X[. = ' '] (: An odd way of getting an empty sequence! :) X[false()] (: Same as X. :) X[true()] (: X elements with exactly 5 children elements. :) X[count(*) eq 5] (: X elements with exactly 5 children nodes (including element, text, comment, and PI nodes but not attribute nodes). :) X[count(node()) eq 5] (: X elements with exactly 5 nodes of any kind. :) X[count(@* | node()) eq 5] (: The first X child, provided it has the value 'some text'; empty otherwise. :) X[1][. eq 'some text'] (: Select all X children with the value 'some text' and return the first or empty if there is no such child. In simpler words, the first X child element that has the string-value 'some text'. :) X[. eq 'some text'][1]
Discussion
As with Recipe 1.1, it is
impossible to completely cover every interesting permutation of filtering
predicates. However, mastering the themes exemplified above should
help you develop almost
any
filtering expression you desire. Also consider that one can create
more complex conditions using the
logical operators and
,
or
and the function not()
.
number(@a) > 5 and X[number(@a) < 10]
When using predicates with complex path expressions, you need to understand the effect of parenthesis.
(: Select the first Y child of every X child of the context node. This expression can result in a sequence of more than one Y. :) X/Y[1] (: Select the sequence of nodes X/Y and then take the first. This expression can at most select one Y. :) (X/Y)[1]
A computer scientist would say that
the conditional operator
[]
binds more tightly than the path operator
/
.
1.3. Working with Sequences
Problem
You want to manipulate collections of arbitrary nodes and atomic values derived from an XML document or documents.
Solution
XPath 1.0
There is no notion of sequence in XPath 1.0 and hence these recipes are largely inapplicable. XPath 1.0 has node sets. There is an idiomatic way to construct the empty node sets using XSLT 1.0.
(: The empty node set :)
/..
XPath 2.0
(: The empty sequence constructor. :) () (: Sequence consisting of the single atomic item 1. :) 1 (: Use the comma operator to construct a sequence. Here we build a sequence of all X children of the context, followed by Y children, followed by Z children. :) X, Y, Z (: Use the to operator to construct ranges. :) 1 to 10 (: Here we combine comma with several ranges. :) 1 to 10, 100 to 110, 17, 19, 23 (: Variables and functions can be used as well. :) 1 to $x 1 to count(para) (: Sequences do not nest so the following two sequences are the same. :) ((1,2,3), (4,5, (6,7), 8, 9, 10)) 1,2,3,4,5,6,7,8,9,10 (: The to operator cannot create a decreasing sequence directly. :) 10 to 1 (: This sequence is empty! :) (: You can accomplish the intended effect with the following. :) for $n in 1 to 10 return 11 - $n (: Remove duplicates from a sequence. :) distinct-values($seq) (: Return the size of a sequence. :) count($seq) (: Test if a sequence is empty. :) empty($seq) (: prefer over count($seq) eq 0 :) (: Locate the positions of an item in a sequence. Index-of produces a sequence of integers for every item in the first arg that is eq to the second. :) index-of($seq, $item) (: Extract subsequences. :) (: Up to 3 items from $seq, starting with the second. :) subsequence($seq, 2, 3) (: All items from $seq at position 3 to the end of the sequence. :) subsequence($seq, 3) (: Insert a sequence, $seq2, before the 3rd item in an input sequence, $seq1. :) insert-before($seq1, 3, $seq2) (: Construct a new sequence that contains all the items of $seq except the 3rd. :) remove($seq1, 3) (: If you need to remove several elements, you might consider an expression like the following. :) $seq1[not(position() = (1,3,5))] $seq1[position() gt 3 and position() lt 7]
Discussion
In XPath 2.0, every data item (value) is a sequence. Thus, the atomic value 1 is just as much a sequence as the result of the expression (1 to 10). Another way of saying this is that every XPath 2.0 expression evaluates to a sequence. A sequence can contain from zero or more values, and these values can be nodes, atomic values, or mixtures of each. Order is significant when comparing sequences. You refer to the individual items of a sequence starting at position 1 (not 0, as someone with a C/Java background might expect).
XPath 1.0 does not have sequences but rather node sets. Node sets are not as tidy a concept as sequence, but in many cases, the distinction is irrelevant. For example, any XPath 1.0 expression that use the functions count() and empty() should behave the same in 2.0. The advantage of XPath 2.0 is that a sequence is a first class construct that can be explicitly constructed and manipulated using a variety of new XPath 2.0 functions. The recipes in this section introduce many important sequence idioms, and you will find many others sprinkled through the recipes of this book.
1.4. Shrinking Conditional Code with If Expressions
Problem
Your complex XSLT code is too verbose due to the high overhead of XML when expressing simple if-then-else conditions.
Solution
XPath 1.0
There are a few tricks you can play in XPath 1.0 to avoid using
XSLT’s verbose xsl:choose
in
simple situations. These tricks rely on the fact that false converts
to 0 and true to 1 when used in a mathematical context.
So, for example, min, max, and absolute value can be calculated directly in XPath 1.0. In these examples, assume $x and $y contain integers.
(: min :) ($x <= $y) * $x + ($y < $x) * $y (: max :) ($x >= $y) * $x + ($y > $x) * $y (: abs :) (1 - 2 * ($x < 0)) * $x
XPath 2.0
For the simple cases in the
XPath 1.0 section
(min, max, abs), there are now built-in XPath functions. For other
simple conditionals, use the new conditional if
expression.
(: Default the value of a missing attribute to 10. :) if (@x) then @x else 10 (: Default the value of a missing element to 'unauthorized'. :) if (password) then password else 'unauthorized' (: Guard against division by zero. :) if ($d ne 0) then $x div $d else 0 (: A para elements text if it contains at least one non-whitespace character; otherwise, a single space. :) if (normalize-space(para)) then string(para) else ' '
Discussion
If you are a veteran XSLT 1.0 programmer, you probably cringe every
time you need to add some conditional code to a template. I know I
do, and often go through pains to exploit XSLT’s
pattern-matching constructs to minimize conditional code. This is not
because such code is more complicated or inefficient in XSLT but
rather because it is so darn verbose. A simple
xsl:if
is not that bad, but if you need to express
if-then-else logic, you are now forced to use the bulkier
xsl:choose
. Yech!
In XSLT 2.0, there is an alternative but it is delivered in XPath 2.0
rather than XSLT 2.0 proper. On first exposure, one may get the
impression that XPath was somehow bastardized via the introduction of
what procedural programmers call flow of control statements. However,
once you begin to use XPath 2.0 in its full glory, you should quickly
conclude that both XPath and XSLT is bettered by these enhancements.
Further, the XPath 2.0 conditional expression does not deprecate the
xsl:if
element but rather reduces the need to use
it in just those cases where it is most awkward. As an illustration,
compare the following snippets:
<!-- XSLT 1.0 --> <xsl:variable name="size"> <xsl:choose> <xsl:when test="$x > 3">big</xsl:when> <xsl:otherwise>small</xsl:when> </xsl:choose> </xsl:variable> <!-- XSLT 2.0 --> <xsl:variable name="size" select="if ($x gt 3) then 'big' else 'small' "/>
I think most readers will prefer the later XPath 2.0 solution over the former XSLT 1.0 one.
One important fact about the XPath conditional expression is that the
else is not optional. C programmers can appreciate this by comparing
it to the a ? b : c
expression in that language.
Often one will use the empty sequence ()
when
there is no other sensible value for the else
part
of the expression.
Conditional expressions are useful for defaulting in the absence of a schema that provides defaults.
(: Defaulting the value of an optional attribute :) if (@optional) then @optional else 'some-default´ (: Defaulting the value of an optional element :) if (optional) then optional else 'some-default´
Handling undefined or undesirable results in expressions is also a good application. In this example we have an application specific reason to prefer 0 rather than number(`Infinity') as the result.
if ($divisor ne 0) then $dividend div $divisor else 0
You can also create conditions that are more complex. The following code that decodes an enumerated list
if (size eq 'XXL') then 50 else if (size eq 'XL') then 45 else if (size eq 'L') then 40 else if (size eq 'M') then 34 else if (size eq 'S') then 32 else if (size eq 'XS') then 29 else -1
However, in this case, you might find a solution using sequences to be cleaner especially if you replace the literal sequences with variables that might be initialized from an external XML file.
(50,45,40,34,32,29,-1)[(index-of((('XXL', 'XL', 'L', 'M', 'S', 'XS')), size), 7)[1]]
Here we are assuming the context has only a single
size
child element otherwise the expression is
illegal (but you can then write size[1] instead). We are also relying
on the fact that index-of
returns an empty
sequence when the search item is not found which we concatenate with
7 to handle
the else
case.
1.5. Eliminating Recursion with for Expressions
Problem
You want to derive an output sequence from an input sequence where each item in the output is an arbitrarily complex function of the input and the sizes of each sequence are not necessarily the same.
Solution
XPath 1.0
Not applicable in 1.0. Use a recursive XSLT template.
XPath 2.0
Use XPath 2.0’s for
expression.
Here we show four cases demonstrating how the for
expression can map sequences of differing input and output sizes.
Aggregation
(: Sum of squares. :) sum(for $x in $numbers return $x * $x) (: Average of squares. :) avg(for $x in $numbers return $x * $x)
Mapping
(: Map a sequence of words in all paragraphs to a sequence of word lengths. :) for $x in //para/tokenize(., ' ') return string-length($x) (: Map a sequence of words in a paragraph to a sequence of word lengths for words greater than three letters. :) for $x in //para/tokenize(., ' ') return if (string-length($x) gt 3) the string-length($x) else () (: Same as above but with a condition on the input sequence. :) for $x in //para/tokenize(., ' ')[string-length(.) gt 3] return string-length($x)
Generating
(: Generate a sequence of squares of the first 100 integers. :) for $i in 1 to 100 return $i * $i (: Generate a sequence of squares in reverse order. :) for $i in 0 to 10 return (10 - $i) * (10 - $i)
Discussion
As I indicated in Recipe 1.4, the addition
of control flow constructs into an expression language like XPath
might at first be perceived as odd or even misguided. You will
quickly overcome your doubts, however, when you experience the
liberating power of these XPath 2.0 constructs. This is especially
true for the XPath 2.0 for
expression.
The power of for
becomes most apparent when one
considers how it can be applied to reduce many complicated recursive
XSLT 1.0 solutions to just a single XPath 2.0 expression. Consider
the problem of computing sums in XSLT 1.0. If all you need is a
simple sum, there is no problem because the built-in XPath 1.0 sum
function will do fine. However, if you need to compute the sum of
squares, you are forced to write a larger, more awkward, and less
transparent recursive template. In fact, a good portion of the first
edition of this book was recipes for canned solutions to these
recursive gymnastics. With XPath 2.0, a sum of squares becomes
nothing more than sum(for $x in $numbers return $x *
$x)
where $numbers
contains the sequence
of numbers we wish to sum over. Think of the trees that I could have
saved if this facility was in XPath 1.0!
However, the for
expression is hiding even more
power. You are not limited to just one iteration variable. Several
variables can be combined to create nested loops that create
sequences from interrelated nodes in a complex document.
(:Return a sequence consisting of para ids and the ids those para elements reference. :) for $s in /*/section, $p in $s/para, $r in $p/ref return ($p/@id, $r)
You should note that, other than being more compact, the preceding expression is not semantically different from the following:
for $s in /*/section return for $p in $s/para return for $r in $p/ref return ($p/@id, $r)
You should also note that there is no need to use a nested
for
when the sequence you are producing is more
elegantly represent by a traditional path expression.
(: This use of for is just a long-winded way of writing /*/section/para/ref. :) for $s in /*/section, $p in $s/para, $r in $p/ref return $r
Sometimes you might want to know the position of each item in a
sequence as you process it. You cannot use the position()
function as you would in an
xsl:for-each
because an XPath for expression does
not alter the context position. However, you can achieve the effect
you want
with the
following expression:
for $pos in 1 to count($sequence), $item in $sequence[$pos] return $item , $pos
1.6. Taming Complex Logic Using Quantifiers
Solution
XPath 1.0
If the condition is based on equality, then the semantics of the
=
and !=
operators in XPath 1.0
and 2.0 will suffice.
(: True if at least one section is referenced. :) //section/@id = //ref/@idref (: True if all section elements are referenced by some ref element. :) count(//section) = count(//section[@id = //ref/@idref])
XPath 2.0
In XPath 2.0, use some
and
every
expressions to do the same.
(: True if at least one section is referenced. :) some $id in //para/@id satisfies $id = //ref/@idref (: True if all section elements are referenced by some ref element. :) every $id in //section/@id satisfies $id = //ref/@idref
However, you can go quite a bit further with less effort in XPath 2.0.
(: There exists a section that references every section except itself. :) some $s in //section satisfies every $id in //section[@id ne $s/@id]/@id satisfies $id = $s/ref/@idref (: $sequence2 is a sub-sequence of $sequence1 :) count($sequence2) <= count($sequence1) and every $pos in 1 to count($sequence1), $item1 in $sequence1[$pos], $item2 in $sequence2[$pos] satisfies $item1 = $item2
If you remove the count check in the preceding expression, it would
assert that at least the first count($sequence1)
items in $sequence2
are the same as corresponding
items in $sequence1
.
Discussion
The semantics of =, !=, <, >, <=, >= in XPath 1.0 and 2.0
sometimes surprise the uninitiated when one of the operands is a
sequence or XPath 1.0 node set. This is because the operators
evaluate to true if there is at least one pair of values from each
side of the expression which compare according to the relation. In
XPath 1.0, this can sometimes work to your advantage, as we have
shown previously, but other times it can leave your head spinning and
you longing to be back in the 5th grade where math made sense. For
example, one would guess that $x = $x
should
always be true, but if $x
is the empty sequence,
it is not! This follows from the fact that you cannot find a pair of
items within each empty sequence that are equal.
1.7. Using Set Operations
Solution
Discussion
In XPath 2.0, node sets are replaced by sequences. Unlike node sets, sequences are ordered and can contain duplicate items. However, when using the XPath 2.0 set operations, duplicates and ordering are ignored so sequences behave just like sets. The result of a set operation will never contain duplicates even if the inputs did.
The except
operator is used in an XPath 2.0 idiom for selecting all attributes
but a given set.
(: All attributes except @a. :) @* except @a (: All attributes except @a and @b. :) @* except @a, @b
In, 1.0, one needs the following more awkward expressions:
@*[local-name(.) != 'a' and local-name(.) != 'b']
Interestingly enough, XPath only allows set operations over sequences of nodes. Atomic values are not allowed. This is because the set operations are over node identity and not value. One can get the effect of sets of values using the following XPath 2.0 expressions. For XPath 1.0, you will need to use XSLT recursion. See Chapter 8.
(: union :) distinct-values( ($items1, $items2) ) (: intersection :) distinct-values( $items1[. = $items2] ) (: difference :) distinct-values( $items1[not(. = $items2)] )
See Also
Recipes Recipe 9.1 and Recipe 9.2 have more examples of set operations.
1.8. Using Node Comparisons
Solution
XPath 1.0
In these examples, assume $x and $y each contain a single node from the same document. Also, recall that document order means the sequence in which nodes appear within a document.
(: Test if $x and $y are the same exact node. :)
generate-id($x) = generate-id($y)
(: You can also take advantage of the the | operator's removal of duplicates. :)
count($x|$y) = 1
(: Test if $x precedes $y in document order - note that this does not work if $x
or $y are attributes. :)
count($x/preceding::node()) < count($y/preceding::node()) or
$x = $y/ancestor::node()
(: Test if $x follows $y in document order - note that this does not work if $x
or $y are attributes. :)
count($x/following::node()) < count($y/following::node()) or
$y = $x/ancestors::node()
Discussion
The new XPath 2.0 node comparison operators are likely to be more
efficient and certainly easier to understand than the XPath 1.0
counterparts. However, if you are using XSLT 2.0, you will not find
too many situations where these operators are required. There are
many situations where you think you need << or >> when
the xsl::for-each-group
element is preferred. See
Recipe 6.2 for examples.
1.9. Coping with XPath 2.0’s Extended Type System
Solution
Most incompatibilities between XPath/XSLT 1.0 and 2.0 come from type errors. This is true regardless of whether a schema is present or not. You can eliminate many problems encountered in porting legacy XSLT 1.0 to XSLT 2.0 with respect to XPath differences by running in 1.0 compatibility mode.
<xsl:stylesheet version="1.0"> <!-- ... --> </xsl:stylesheet>
In my opinion, eventually you will want to stop using compatibility mode. XPath 2.0 provides several facilities for dealing with type conversions. First, you can use conversion functions explicitly.
(: Convert the first X child of the context to a number. :) number(X[1]) + 17 (: Convert a number in $n to a string. :) concat("id-", string($n))
XPath 2.0 also provides type constructors so you can explicitly control the interpretation of a string.
(: Construct a date from a string. :) xs:date("2005-06-01") (: Construct doubles from strings. :) xs:double("1.1e8") + xs:double("23000")
Finally, XPath has the operators castable as
,
cast as
, and treat
as
. Most of the time, you want to use the first
two.
if ($x castable as xs:date) then $x cast as xs:date else xs:date("1970-01-01")
The operator, treat
as
, is not
a conversion per se but rather an assertion that tells the XPath
processor that you promise at runtime a value will conform to a
specified type. If this turns out not to be the case, then a type
error will occur. XPath 2.0 added treat
as
so XPath implementers could perform static
(compile time) type checking in addition to dynamic type checking
while allowing programmers to selectively disable static type checks.
Static type checking XSLT 2.0 implementations will likely be rare so
you can ignore treat
as
for the
time being. It is far more likely to arise in higher-end XQuery
processors that do static type checking to facilitate various
optimizations.
Discussion
Running in 1.0 compatibility mode with an XSLT 2.0 processor does not mean you cannot use any of the new 2.0 features. It simply enables certain XPath 1.0 conversion rules.
It allows non-numeric types used in a context where numbers are expected to convert automatically to numbers via atomization followed by application of the
number()
function.
(: In compatability mode, the following evaluates to 18.1 but is a type error in 2.0. :) "1.1" + "17"
It allows non-string types used in a context where strings are expected to convert automatically to strings via atomization followed by application of the
string()
function.
(: In compatability mode, the following evaluates to 2 but is a type error in 2.0. :) string-length(1 + 2 + 3 + 4 + 5)
It automatically discards items from sequences of size two or more they are used in a context where a singleton is expected. This often happens when one is passing the result of a path expression to a function.
<poem> <line>There once was a programmer from Nantucket.</line> <line>Who liked his bits in a bucket.</line> <line>He said with a grin</line> <line>and drops of coffee on his chin,</line> <line>"If XSLT had a left-shift, I would love it!"</line> <poem> (: In compatability mode, both expressions evaluate to 43 but the first is a type error in 2.0. :) string-length(/poem/line) string-length(/poem/line[1])
1.10. Exploiting XPath 2.0’s Extended Type System
Solution
If you validate your documents against a schema, the resulting nodes become annotated with type information. You can then test for these types in XPath 2.0 (and while matching templates in XSLT 2.0).
(: Test if all invoiceDate elements have been validated as dates. :) if (order/invoiceDate instance of element(*, xs:date)) then "invoicing complete" else " invoicing incomplete"
Warning
instance
of
is only
useful in the presence of schema validation. In addition,
it is not the same as castable as
. For instance,
10 castable
as
xs:positiveInteger
is always true but
10
instance
of
xs:positiveInteger
is never
true because literal integer types are labeled as
xs:decimal
.
However, the benefit of validation is not simply the ability to test
types using instance of
but rather from the safety
and convenience of knowing that there will be no type error surprises
once validation is passed. This can lead to more concise stylesheets.
(: Without validation, you should code like this. :) for $order in Order return xs:date($order/invoiceDate) - xs:date($order/createDate) (: If you know all date elements have been validated, you can dispense with the xs:date constructor. for $order in Order return $order/invoiceDate - $order/createDate
Discussion
My personal preference is to use XML Schemas as specification documents and not validation tools. Therefore, I tend to write XSLT transformations in ways that are resilient to type errors and use explicit conversions where needed. Stylesheets written in this manner will work in the presence of validation or not.
Once you begin to write stylesheets that depend on validation, you are locked into implementations that perform validation. On the other hand, if your company standards say all XML documents will be schema-validated before processing, then you can simplify your XSLT based on assurances that certain data types will appear in certain situations.
Get XSLT Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.