Chapter 1. XPath

Neo, sooner or later you’re going to realize just as I did that there’s a difference between knowing the path and walking the path.

Morpheus (The Matrix)

Introduction

XPath is an expression language that is fundamental to XML processing. You can no more master XSLT without mastering XPath than you can master English without learning the alphabet. Several readers of the first edition of XSLT Cookbook took me to task for not covering XPath. This chapter has been added partly to appease them but more so due to the greatly increased power of the latest XPath 2.0 specifications. However, many of these recipes are applicable to XPath 1.0 as well.

In XSLT 1.0, XPath plays three crucial roles. First, it is used within templates for addressing into the document to extract data as it is being transformed. Second, XPath syntax is used as a pattern language in the matching rules for templates. Third, it is used to perform simple math and string manipulations via built-in XPath operators and functions.

XSLT 2.0 retains and strengthens this intimate connection with XPath 2.0 by drawing heavily on the new computational abilities of XPath 2.0. In fact, one can make a reasonable argument that the enhanced capabilities of XSLT 2.0 stem largely from the advances in XPath 2.0. The new XPath 2.0 facilities include sequences, regular expressions, conditional and iterative expressions, and enhanced XML Schema compliant-type system as well as a large number of new built-in functions.

Each recipe in this chapter is a collection of mini-recipes for solving certain classes of XPath problems that often arise while using XSLT. We annotate each XPath expression with the XPath 2.0 commenting convention (: comment :) but users of XPath/XSLT 1.0 should be aware that these comments are not legal syntax. When we are showing the result of an XPath evaluation that is empty, we will write (), which happens to be the way one writes a literal empty sequence in XPath 2.0.

1.1. Effectively Using Axes

Problem

You need to select nodes in an XML tree in ways that consider complex relationships within the hierarchical structure.

Solution

Each of the following solutions is organized around related sets of axes. For each group, a sample XML document is presented with the context node in bold. An explanation of the effect of evaluating the path is provided, along with an indication of the nodes that will be selected with respect to the highlighted context. In some cases, the solution will consider other nodes as the context to illustrate subtleties of the particular path expression.

Child and descendant axes

The child axis is the default axis in XPath. This means one does not need to use the child:: axis specification, but you can if you are feeling pedantic. One can reach deeper into the XML tree using the descendant:: and the descendant-or-self:: axes. The former excludes the context node and the latter includes it.

<Test id="descendants">
   <parent>
      <X id="1"/>
      <X id="2"/>
      <Y id="3">
        <X id="3-1"/>
        <Y id="3-2"/>
        <X id="3-3"/>
      </Y>
      <X id="4"/>
      <Y id="5"/>
      <Z id="6"/>
      <X id="7"/>
      <X id="8"/>
      <Y id="9"/>
    </parent>
</Test>

(: Select all child elements named X :)
X   (: same as child::X :)

Result: <X id="1"/> <X id="2"/> <X id="4"/> <X id="7"/><X id="8"/>

(:Select the first X child element:)

X[1]    

Result: <X id="1"/>

(:Select the last X child element:)

X[last()]    

Result: <X id="8"/>


(:Select the first element, provided it is an X. Otherwise empty:)

*[1][self::X]    

Result: <X id="1"/>


(:Select the last child, provided it is an X. Otherwise empty:)

*[last()][self::X]    

Result: ()

*[last()][self::Y]    

Result: <Y id="9"/>

(: Select all descendants named X :)
descendant::X

Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/>

(: Select the context node, if it is an X, and all descendants named X :)

descendant-or-self::X

Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/>

(: Select the context node and all descendant elements :)

descendant-or-self::*
Result: <parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X 
id="3-3"/> </Y> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y 
id="9"/> </parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X 
id="3-3"/> </Y> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> <X id="4"/> <Y id="5"/> <Z 
id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/>

Sibling axes

The sibling axes include preceding-sibling:: and following-sibling::. As the names suggest, the preceding-sibling axis consists of siblings that precede the context node and the following-sibling axis consists of siblings that follow it. Siblings are, of course, child nodes that share the same parent. Most of the examples below use preceding-sibling::, but you should be able to work out the results for following-sibling:: without too much trouble.

Keep in mind that when using a positional path expression of the form preceding-sibling::*[1], you are referring to the immediately preceding sibling looking back from the context node and not the first sibling in document order. Some people get confused because the resulting sequence is in document order regardless as to whether you use preceding-sibling:: or following-sibling::. Although not an axis expression per say, ../X is a way of saying, select both preceding and following siblings named X as well as the context node, should it be an X. More formally speaking, it is an abbreviation for parent::node()/X. Note that (preceding-sibling::*)[1] and (following-sibling::*)[1] will select the first preceding/following sibling in document order.

<!-- Sample document with context node highlighted -->
<Test id="preceding-siblings">
    <A id="1"/>
    <A id="2"/>
    <B id="3"/>
    <A id="4"/>
    <B id="5"/>
    <C id="6"/>
    <A id="7"/>
    <A id="8"/>
    <B id="9"/>
</Test>

(:Select all A sibling elements that precede the context node. :)
preceding-sibling::A

Result: <A id="1"/> <A id="2"/> <A id="4"/>

(:Select all A sibling elements that follow the context node. :)
following-sibling::A

Result: <A id="8"/> 

(:Select all sibling elements that precede the context node. :)
preceding-sibling::*    

Result: <A id="1"/> <A id="2"/> <B id="3"/> <A id="4"/> <B id="5"/> <C id="6"/>


(: Select the first preceding sibling element named A in reverse document order. :)
preceding-sibling::A[1]    

Result: <A id="4"/>

(: The first preceding element in reverse document order, provided it is an A. :)
preceding-sibling::*[1][self::A]    

Result: () 
(: If the context was <A id="8"/>, the result would be <A id="7"/> :)

(:All preceding sibling elements that are not A elements:)
preceding-sibling::*[not(self::A)]

Result <B id="3"/> <B id="5"/> <C id="6"/>

(: For the following recipes use this document. :)

<Test id="preceding-siblings">
        <A id="1">
            <A/>
        </A>
        <A id="2"/>
        <B id="3">
        <A/>
        </B>
        <A id="4"/>
        <B id="5"/>
        <C id="6"/>
        <A id="7"/>
        <A id="8"/>
        <B id="9"/>
</Test>

(: The element directly preceding the context provided it has a child element A :)
preceding-sibling::*[1][A]

Result: ()

The first element preceding the context that has a child A                
preceding-sibling::*[A][1]        

Result:        <B id="3"> ...

(: XPath 2.0 allows more flexibility to select elements with respect to namespaces. 
For these recipes the following XML document applies. :)   

<Test xmlns:NS="http://www.ora.com/xstlcbk/1" xmlns:NS2="http://www.ora.com/xstlcbk/2">
  <NS:A id="1"/>
  <NS2:A id="2"/>
  <NS:B id="3"/>
  <NS2:B id="3"/>
</Test>

(: Select the preceding sibling elemements of the context whose namespace 
is the namespace associated with prefix NS :)                                
preceding-sibling::NS:*

Result:        <NS:A id="1"/>

(: Select the preceding sibling elemements of the context whose local name is A :)
preceding-sibling::*:A

Result:        <NS:A id="1"/>, <NS2:A id="2"/>

Parent and ancestor axes

The parent axis (parent::) refers to the parent of the context node. The expression parent::X should not be confused with ../X. The former will produce a sequence of exactly one element provided the parent of the context is X or empty otherwise. The latter is a shorthand for parent::node()/X, which will select all siblings of the context node named X, including the context itself, should it be an X.

One can navigate to higher levels of the XML tree (parents, grandparents, great-grandparents, and so on) using either ancestor:: or ancestor-or-self::. The former excludes the context and the latter includes it.

(: Select the parent of the context node, provided it is an X element. Empty otherwise. :)
parent::X

(: Select the parent element of the context node. Can only be empty if the context 
is the top-level element. :)
parent::*

(: Select the parent if it is in the namespace associated with the prefex NS. 
The prefix must be defined; otherwise, it is an error. :)
parent::NS:*

(: Select the parent, regardless of its namespace, provided the local name is X. :)
parent::*:X

(: Select all ancestor elements (including the parent) named X. :)
ancestor::X    

(: Select the context, provided it is an X, and all ancestor elements named X. :)
ancestor-or-self::X

Preceding and following axes

The preceding and following axes have the potential to select a large number of nodes, because they consider all nodes that come before (after) the context node in document order excluding ancestor nodes. The following axis excludes descendants, and the preceding axis excludes ancestors. Also don’t forget: both axes exclude namespace nodes and attributes.

(: All preceding element nodes named X. :)
preceding::X

(: The closest preceding element node named X. :)
preceding::X[1]

(: The furthest following element node named X. :)
following::X[last()]

Discussion

XPath uses the notion of an axis to partition the document tree into subsets relative to some node called the context node. In general, these subsets overlap, but the ancestor, descendant, following, preceding, and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap, and together they contain all the nodes in the document. The context node is established by the XPath hosting language. In XSLT, the context is set via:

  • a template match (<xsl:template match="x"> ... </xsl:template>)

  • xsl:for-each

  • xsl:apply-templates

Effectively wielding the kinds of path expression shown in the solution is key to performing both simple and complex transformations. Experience with traditional programming languages sometimes leads to confusion and mistakes when using XPath. For example, I often used to catch myself writing something like <xsl:if test="preceding-sibling::X[1]"> </xsl:if> when I really intended <xsl:if test="preceding-sibling::*[1][self::X]"> </xsl:if>. This is probably because the latter is a less than intuitive way of saying “test if the immediately preceding sibling is an X.”

It is, of course, impossible to show every useful permutation of path expressions using axes. But if you understand the building blocks presented previously you are well on your way to decoding the meaning of constructs such as preceding-sibling::X[1]/descendant::Z[A/B] or worse.

1.2. Filtering Nodes

Problem

You need to select nodes based on the data they contain instead or in addition to their names or position.

Solution

Many of the mini-recipes in Recipe 1.1 used predicates to filter nodes, but those predicates were based strictly on position of the node or node name. Here we consider a variety of predicates that filter based on data content. In these examples, we use a simple child element path X before each predicate, but one could equally substitute any path expression for X, including those in Recipe 1.1.

Tip

In the following examples, we use the XPath 2.0 comparison operators (eq, ne, lt, le, gt, and ge) instead of the operators (=, !=, <, <=, >, and >=). This is because when one is comparing atomic values, the new operators are preferred. In XPath 1.0, you only have the latter operators so make the appropriate substitution. The new operators were introduced in XPath 2.0 because they have simpler semantics and will probably be more efficient as a result. The complexity of the old operators comes when one considers cases where a sequence is on either side of the comparison. Recipe 1.8 covers this topic further.

Another point must be made for those working in XPath 2.0 because that version incorporates type information when a schema is available. That could lead to some of the expressions below to have type errors. For example, X[@a = 10] is not the same as X[@a = '10'] when the attribute a has an integer type. Here we assume there is no schema and therefore all atomic values have the type untypedAtomic. You can find more on this topic in Recipes Recipe 1.9 and Recipe 1.10.

(: Select X child elements that have an attribute named a. :)
X[@a]

(: Select X children that have at least one attribute. :)
X[@*]

(: Select X children that have at least three attributes. :)
X[count(@*) > 2]

(: Select X children whose attributes sum to a value less than 7. :)
X[sum(foreach $a in @* return number($a)) < 7] (: In XSLT 1.0 use sum(@*) &lt; 7 :)

(: Select X children that have no attributes named a. :)
X[not(@a)] 

(: Select X children that have no attributes. :)
X[not(@*)] 

(: Select X children that have an attribute named a with value '10'. :)
X[@a eq '10'] 

(: Select X children that have a child named Z with value '10'. :)
X[Z eq '10'] 

(: Select X children that have a child named Z with value not equal to '10'. :)
X[Z ne '10'] 

(: Select X children if they have at least one child text node. :)
X[text()] 

(: Select X children if they have a text node with at least one non-whitespace 
character. :)
X[text()[normalize-space(.)]] 

(: Select X children if they have any child node. :)
X[node()] 

(: Select X children if they contain a comment node. :)
X[comment()] 

(: Select X children if they have an @a whose numerical value is less than 10. 
This expression will work equally well in XPath 1.0 and 2.0 regardles of whether 
@a is a string or a numeric type. :)

X[number(@a) < 10] 

(: Select X if it has at least one preceding sibling named Z with an attribute y 
that is not equal to 10. :)

X[preceding-sibling::Z/@y ne '10'] 

(: Select X children whose string-value consist of a single space character. :)
X[. = ' ']

(: An odd way of getting an empty sequence! :)
X[false()]

(: Same as X. :)
X[true()]

(: X elements with exactly 5 children elements. :)
X[count(*) eq 5]

(: X elements with exactly 5 children nodes (including element, text, comment, 
and PI nodes but not attribute nodes). :)
X[count(node()) eq 5]

(: X elements with exactly 5 nodes of any kind. :)
X[count(@* | node()) eq 5]

(: The first X child, provided it has the value 'some text'; empty otherwise. :)
X[1][. eq 'some text']

(: Select all X children with the value 'some text' and return the first or empty 
if there is no such child. In simpler words, the first X child element that has the 
string-value 'some text'. :)
X[. eq 'some text'][1]

Discussion

As with Recipe 1.1, it is impossible to completely cover every interesting permutation of filtering predicates. However, mastering the themes exemplified above should help you develop almost any filtering expression you desire. Also consider that one can create more complex conditions using the logical operators and, or and the function not().

               number(@a) > 5 and X[number(@a) < 10]

When using predicates with complex path expressions, you need to understand the effect of parenthesis.

(: Select the first Y child of every X child of the context node. This expression 
can result in a sequence of more than one Y. :)
X/Y[1]

(: Select the sequence of nodes X/Y and then take the first. This expression can 
at most select one Y. :)

(X/Y)[1]

A computer scientist would say that the conditional operator [] binds more tightly than the path operator /.

1.3. Working with Sequences

Problem

You want to manipulate collections of arbitrary nodes and atomic values derived from an XML document or documents.

Solution

XPath 1.0

There is no notion of sequence in XPath 1.0 and hence these recipes are largely inapplicable. XPath 1.0 has node sets. There is an idiomatic way to construct the empty node sets using XSLT 1.0.

(: The empty node set :)
/..

XPath 2.0

(: The empty sequence constructor. :)
()

(: Sequence consisting of the single atomic item 1. :)
1

(: Use the comma operator to construct a sequence. Here we build a sequence 
of all X children of the context, followed by Y children, followed by Z children. :)
X, Y, Z

(: Use the to operator to construct ranges. :)
1 to 10

(: Here we combine comma with several ranges. :)
1 to 10, 100 to 110, 17, 19, 23

(: Variables and functions can be used as well. :)
1 to $x

                  1 to count(para)

(: Sequences do not nest so the following two sequences are the same. :)

((1,2,3), (4,5, (6,7), 8, 9, 10))

                  1,2,3,4,5,6,7,8,9,10

(: The to operator cannot create a decreasing sequence directly. :)
10 to 1 (: This sequence is empty! :)

(: You can accomplish the intended effect with the following. :)
for $n in 1 to 10 return 11 - $n

(: Remove duplicates from a sequence. :)
distinct-values($seq)

(: Return the size of a sequence. :)
count($seq)

(: Test if a sequence is empty. :)
empty($seq) (: prefer over count($seq) eq 0 :)

(: Locate the positions of an item in a sequence. Index-of produces a sequence 
of integers for every item in the first arg that is eq to the second. :)
index-of($seq, $item) 

(: Extract subsequences. :)

(: Up to 3 items from $seq, starting with the second. :)
subsequence($seq, 2, 3)

(: All items from $seq at position 3 to the end of the sequence. :)
subsequence($seq, 3)

(: Insert a sequence, $seq2, before the 3rd item in an input sequence, $seq1. :)
insert-before($seq1, 3, $seq2)

(: Construct a new sequence that contains all the items of $seq except the 3rd. :)
remove($seq1, 3)

(: If you need to remove several elements, you might consider an expression like the following. :)

$seq1[not(position() = (1,3,5))] 

                  $seq1[position() gt 3 and position() lt 7]

Discussion

In XPath 2.0, every data item (value) is a sequence. Thus, the atomic value 1 is just as much a sequence as the result of the expression (1 to 10). Another way of saying this is that every XPath 2.0 expression evaluates to a sequence. A sequence can contain from zero or more values, and these values can be nodes, atomic values, or mixtures of each. Order is significant when comparing sequences. You refer to the individual items of a sequence starting at position 1 (not 0, as someone with a C/Java background might expect).

XPath 1.0 does not have sequences but rather node sets. Node sets are not as tidy a concept as sequence, but in many cases, the distinction is irrelevant. For example, any XPath 1.0 expression that use the functions count() and empty() should behave the same in 2.0. The advantage of XPath 2.0 is that a sequence is a first class construct that can be explicitly constructed and manipulated using a variety of new XPath 2.0 functions. The recipes in this section introduce many important sequence idioms, and you will find many others sprinkled through the recipes of this book.

1.4. Shrinking Conditional Code with If Expressions

Problem

Your complex XSLT code is too verbose due to the high overhead of XML when expressing simple if-then-else conditions.

Solution

XPath 1.0

There are a few tricks you can play in XPath 1.0 to avoid using XSLT’s verbose xsl:choose in simple situations. These tricks rely on the fact that false converts to 0 and true to 1 when used in a mathematical context.

So, for example, min, max, and absolute value can be calculated directly in XPath 1.0. In these examples, assume $x and $y contain integers.

(: min :)
($x <= $y) * $x + ($y < $x) * $y

(: max :)
($x >= $y) * $x + ($y > $x) * $y

(: abs :)
(1 - 2 * ($x < 0)) * $x

XPath 2.0

For the simple cases in the XPath 1.0 section (min, max, abs), there are now built-in XPath functions. For other simple conditionals, use the new conditional if expression.

(: Default the value of a missing attribute to 10. :)
if (@x) then @x else 10

(: Default the value of a missing element to 'unauthorized'. :)
if (password) then password else 'unauthorized'

(: Guard against division by zero. :)
if ($d ne 0) then $x div $d else 0

(: A para elements text if it contains at least one non-whitespace character; otherwise, a single space. :)
if (normalize-space(para)) then string(para) else ' '

Discussion

If you are a veteran XSLT 1.0 programmer, you probably cringe every time you need to add some conditional code to a template. I know I do, and often go through pains to exploit XSLT’s pattern-matching constructs to minimize conditional code. This is not because such code is more complicated or inefficient in XSLT but rather because it is so darn verbose. A simple xsl:if is not that bad, but if you need to express if-then-else logic, you are now forced to use the bulkier xsl:choose. Yech!

In XSLT 2.0, there is an alternative but it is delivered in XPath 2.0 rather than XSLT 2.0 proper. On first exposure, one may get the impression that XPath was somehow bastardized via the introduction of what procedural programmers call flow of control statements. However, once you begin to use XPath 2.0 in its full glory, you should quickly conclude that both XPath and XSLT is bettered by these enhancements. Further, the XPath 2.0 conditional expression does not deprecate the xsl:if element but rather reduces the need to use it in just those cases where it is most awkward. As an illustration, compare the following snippets:

<!-- XSLT 1.0 -->
<xsl:variable name="size">
  <xsl:choose> 
    <xsl:when test="$x &gt; 3">big</xsl:when>
    <xsl:otherwise>small</xsl:when>
  </xsl:choose>
</xsl:variable>

<!-- XSLT 2.0 -->
<xsl:variable name="size" select="if ($x gt 3) then 'big' else 'small' "/>

I think most readers will prefer the later XPath 2.0 solution over the former XSLT 1.0 one.

One important fact about the XPath conditional expression is that the else is not optional. C programmers can appreciate this by comparing it to the a ? b : c expression in that language. Often one will use the empty sequence () when there is no other sensible value for the else part of the expression.

Conditional expressions are useful for defaulting in the absence of a schema that provides defaults.

(: Defaulting the value of an optional attribute :)
if (@optional) then @optional else 'some-default´

(: Defaulting the value of an optional element :)
if (optional) then optional else 'some-default´

Handling undefined or undesirable results in expressions is also a good application. In this example we have an application specific reason to prefer 0 rather than number(`Infinity') as the result.

if ($divisor ne 0) then $dividend div $divisor else 0

You can also create conditions that are more complex. The following code that decodes an enumerated list

if (size eq 'XXL') then 50
else if (size eq 'XL') then 45
else if (size eq 'L') then 40
else if (size eq 'M') then 34
else if (size eq 'S') then 32
else if (size eq 'XS') then 29
else -1

However, in this case, you might find a solution using sequences to be cleaner especially if you replace the literal sequences with variables that might be initialized from an external XML file.

(50,45,40,34,32,29,-1)[(index-of((('XXL', 'XL', 'L', 'M', 'S', 'XS')), size), 7)[1]]

Here we are assuming the context has only a single size child element otherwise the expression is illegal (but you can then write size[1] instead). We are also relying on the fact that index-of returns an empty sequence when the search item is not found which we concatenate with 7 to handle the else case.

1.5. Eliminating Recursion with for Expressions

Problem

You want to derive an output sequence from an input sequence where each item in the output is an arbitrarily complex function of the input and the sizes of each sequence are not necessarily the same.

Solution

XPath 1.0

Not applicable in 1.0. Use a recursive XSLT template.

XPath 2.0

Use XPath 2.0’s for expression. Here we show four cases demonstrating how the for expression can map sequences of differing input and output sizes.

Aggregation

(: Sum of squares. :)



sum(for $x in $numbers return $x * $x)
 
(: Average of squares. :)
avg(for $x in $numbers return $x * $x)

Mapping

(: Map a sequence of words in all paragraphs to a sequence of word lengths. :)
for $x in //para/tokenize(., ' ')  return string-length($x) 

(: Map a sequence of words in a paragraph to a sequence of word lengths for words greater than three letters. :)
for $x in //para/tokenize(., ' ')  return if (string-length($x) gt 3) the string-length($x) else () 

(: Same as above but with a condition on the input sequence. :)
for $x in //para/tokenize(., ' ')[string-length(.) gt 3] return string-length($x)

Generating

(: Generate a sequence 


of squares of the first 100 integers. :)
for $i in 1 to 100 return $i * $i 

(: Generate a sequence of squares in reverse order. :)
for $i in 0 to 10 return  (10 - $i) * (10 - $i)

Expanding

(: Map a sequence of paragraphs to a duped sequence of paragraphs. :)
for $x in //para return ($x, $x)

(: Duplicate words. :)
for $x in //para/tokenize(., ' ') return ($x, $x)

(: Map words to word followed by word length. :)
for $x in //para/tokenize(., ' ') return ($x, string-length($x))

Joining

(: For each customer,
 output an id and the total of all the customers orders. :)
for $cust in doc('customer.xml')/*/customer 
                  return
                  ($cust/id/text(), 
      sum(for $ord in doc('orders.xml')/*/order[custID eq $cust/id]
                  return ($ord/total)) )

Discussion

As I indicated in Recipe 1.4, the addition of control flow constructs into an expression language like XPath might at first be perceived as odd or even misguided. You will quickly overcome your doubts, however, when you experience the liberating power of these XPath 2.0 constructs. This is especially true for the XPath 2.0 for expression.

The power of for becomes most apparent when one considers how it can be applied to reduce many complicated recursive XSLT 1.0 solutions to just a single XPath 2.0 expression. Consider the problem of computing sums in XSLT 1.0. If all you need is a simple sum, there is no problem because the built-in XPath 1.0 sum function will do fine. However, if you need to compute the sum of squares, you are forced to write a larger, more awkward, and less transparent recursive template. In fact, a good portion of the first edition of this book was recipes for canned solutions to these recursive gymnastics. With XPath 2.0, a sum of squares becomes nothing more than sum(for $x in $numbers return $x * $x) where $numbers contains the sequence of numbers we wish to sum over. Think of the trees that I could have saved if this facility was in XPath 1.0!

However, the for expression is hiding even more power. You are not limited to just one iteration variable. Several variables can be combined to create nested loops that create sequences from interrelated nodes in a complex document.

(:Return a sequence consisting of para ids and the ids those para elements reference. :)
for $s in /*/section,
    $p in $s/para,
    $r in $p/ref
               return ($p/@id, $r)

You should note that, other than being more compact, the preceding expression is not semantically different from the following:

for $s in /*/section 
      return for $p in $s/para  
          return for $r in $p/ref
          return ($p/@id, $r)

You should also note that there is no need to use a nested for when the sequence you are producing is more elegantly represent by a traditional path expression.

(: This use of for is just a long-winded way of writing /*/section/para/ref. :)
for $s in /*/section, 
      $p in $s/para, 
      $r in $p/ref return $r

Sometimes you might want to know the position of each item in a sequence as you process it. You cannot use the position() function as you would in an xsl:for-each because an XPath for expression does not alter the context position. However, you can achieve the effect you want with the following expression:

for $pos in 1 to count($sequence),
   $item in $sequence[$pos]
         return $item , $pos

1.6. Taming Complex Logic Using Quantifiers

Problem

You need to test a sequence for the existence of a condition in some or all of its items.

Solution

XPath 1.0

If the condition is based on equality, then the semantics of the = and != operators in XPath 1.0 and 2.0 will suffice.

(: True if at least one section is referenced. :)
//section/@id = //ref/@idref

(: True if all section elements are referenced by some ref element. :)
count(//section) = count(//section[@id = //ref/@idref])

XPath 2.0

In XPath 2.0, use some and every expressions to do the same.

(: True if at least one section is referenced. :)
some $id in //para/@id satisfies $id = //ref/@idref

(: True if all section elements are referenced by some ref element. :)
every $id in //section/@id satisfies $id = //ref/@idref

However, you can go quite a bit further with less effort in XPath 2.0.

(: There exists a section that references every section except itself. :)
some $s in //section satisfies
    every $id in //section[@id ne $s/@id]/@id satisfies $id = $s/ref/@idref 

(: $sequence2 is a sub-sequence of $sequence1 :)
count($sequence2) <= count($sequence1) and
every $pos in 1 to count($sequence1),
          $item1 in $sequence1[$pos],
          $item2 in $sequence2[$pos] satisfies $item1 = $item2

If you remove the count check in the preceding expression, it would assert that at least the first count($sequence1) items in $sequence2 are the same as corresponding items in $sequence1.

Discussion

The semantics of =, !=, <, >, <=, >= in XPath 1.0 and 2.0 sometimes surprise the uninitiated when one of the operands is a sequence or XPath 1.0 node set. This is because the operators evaluate to true if there is at least one pair of values from each side of the expression which compare according to the relation. In XPath 1.0, this can sometimes work to your advantage, as we have shown previously, but other times it can leave your head spinning and you longing to be back in the 5th grade where math made sense. For example, one would guess that $x = $x should always be true, but if $x is the empty sequence, it is not! This follows from the fact that you cannot find a pair of items within each empty sequence that are equal.

1.7. Using Set Operations

Problem

You want to process sequences as if they were mathematical sets.

Solution

XPath 1.0

The union operation (|) over nodes is supported in XPath 1.0, but one needs a bit of trickery to achieve intersection and set difference.

(: union :)
$set1 | $set2

(: intersection :)
$set1[count(. | $set2) = count($set2)]

(: difference :)
$set1[count(. | $set2) != count($set2)]

XPath 2.0

The | operator in XPath 2.0 remains but union is added as an alias. In addition, intersect and except are added for intersection and set difference respectively.

$set1 union $set2

(: intersection :)
$set1 intersect $set2

(: difference :)
$set1 except $set2

Discussion

In XPath 2.0, node sets are replaced by sequences. Unlike node sets, sequences are ordered and can contain duplicate items. However, when using the XPath 2.0 set operations, duplicates and ordering are ignored so sequences behave just like sets. The result of a set operation will never contain duplicates even if the inputs did.

The except operator is used in an XPath 2.0 idiom for selecting all attributes but a given set.

(: All attributes except @a. :)
@* except @a  

(: All attributes except @a and @b. :)
@* except @a, @b

In, 1.0, one needs the following more awkward expressions:

@*[local-name(.) != 'a' and local-name(.) != 'b']

Interestingly enough, XPath only allows set operations over sequences of nodes. Atomic values are not allowed. This is because the set operations are over node identity and not value. One can get the effect of sets of values using the following XPath 2.0 expressions. For XPath 1.0, you will need to use XSLT recursion. See Chapter 8.

(: union :)
distinct-values( ($items1, $items2) )

(: intersection :)
distinct-values( $items1[. = $items2] )

(: difference :)
distinct-values( $items1[not(. = $items2)] )

See Also

Recipes Recipe 9.1 and Recipe 9.2 have more examples of set operations.

1.8. Using Node Comparisons

Problem

You want identify nodes or relate them by their position in a document.

Solution

XPath 1.0

In these examples, assume $x and $y each contain a single node from the same document. Also, recall that document order means the sequence in which nodes appear within a document.

(: Test if $x and $y are the same exact node. :)
generate-id($x) = generate-id($y)

(: You can also take advantage of the the | operator's removal of duplicates. :)
count($x|$y) = 1

(: Test if $x precedes $y in document order - note that this does not work if $x 
or $y are attributes. :)
count($x/preceding::node()) < count($y/preceding::node()) or 
$x = $y/ancestor::node()

(: Test if $x follows $y in document order - note that this does not work if $x 
or $y are attributes. :)
count($x/following::node()) < count($y/following::node()) or
$y = $x/ancestors::node()

XPath 2.0

(: Test if $x and $y are the same exact node. :)
$x is $y

(: Test if $x precedes $y in document order. :)
$x << $y

(: Test if $x follows $y in document order. :)
$x >> $y

Discussion

The new XPath 2.0 node comparison operators are likely to be more efficient and certainly easier to understand than the XPath 1.0 counterparts. However, if you are using XSLT 2.0, you will not find too many situations where these operators are required. There are many situations where you think you need << or >> when the xsl::for-each-group element is preferred. See Recipe 6.2 for examples.

1.9. Coping with XPath 2.0’s Extended Type System

Problem

XPath 2.0’s stricter type rules have you cursing the W3C and longing for Perl.

Solution

Most incompatibilities between XPath/XSLT 1.0 and 2.0 come from type errors. This is true regardless of whether a schema is present or not. You can eliminate many problems encountered in porting legacy XSLT 1.0 to XSLT 2.0 with respect to XPath differences by running in 1.0 compatibility mode.

<xsl:stylesheet version="1.0">

    <!-- ... -->

</xsl:stylesheet>

In my opinion, eventually you will want to stop using compatibility mode. XPath 2.0 provides several facilities for dealing with type conversions. First, you can use conversion functions explicitly.

(: Convert the first X child of the context to a number. :)
number(X[1]) + 17

(: Convert a number in $n to a string. :)
concat("id-", string($n))

XPath 2.0 also provides type constructors so you can explicitly control the interpretation of a string.

(: Construct a date from a string. :)
xs:date("2005-06-01") 

(: Construct doubles from strings. :)
xs:double("1.1e8") + xs:double("23000")

Finally, XPath has the operators castable as, cast as, and treat as. Most of the time, you want to use the first two.

if ($x castable as xs:date) then $x cast as xs:date else xs:date("1970-01-01")

The operator, treat as, is not a conversion per se but rather an assertion that tells the XPath processor that you promise at runtime a value will conform to a specified type. If this turns out not to be the case, then a type error will occur. XPath 2.0 added treat as so XPath implementers could perform static (compile time) type checking in addition to dynamic type checking while allowing programmers to selectively disable static type checks. Static type checking XSLT 2.0 implementations will likely be rare so you can ignore treat as for the time being. It is far more likely to arise in higher-end XQuery processors that do static type checking to facilitate various optimizations.

Discussion

Running in 1.0 compatibility mode with an XSLT 2.0 processor does not mean you cannot use any of the new 2.0 features. It simply enables certain XPath 1.0 conversion rules.

  • It allows non-numeric types used in a context where numbers are expected to convert automatically to numbers via atomization followed by application of the number() function.

(: In compatability mode, the following evaluates to 18.1 but is a type error in 2.0. :)
"1.1" + "17"
  • It allows non-string types used in a context where strings are expected to convert automatically to strings via atomization followed by application of the string() function.

(: In compatability mode, the following evaluates to 2 but is a type error in 2.0. :)
string-length(1 + 2 + 3 + 4 + 5)
  • It automatically discards items from sequences of size two or more they are used in a context where a singleton is expected. This often happens when one is passing the result of a path expression to a function.

<poem>
    <line>There once was a programmer from Nantucket.</line> 
    <line>Who liked his bits in a bucket.</line> 
    <line>He said with a grin</line> 
    <line>and drops of coffee on his chin,</line> 
    <line>"If XSLT had a left-shift, I would love it!"</line> 
<poem>

(: In compatability mode, both
 expressions evaluate to 43 but the first is a type error in 2.0. :)
string-length(/poem/line)
string-length(/poem/line[1])

1.10. Exploiting XPath 2.0’s Extended Type System

Problem

You use XML Schema religiously when processing XML and would like to reap its rewards.

Solution

If you validate your documents against a schema, the resulting nodes become annotated with type information. You can then test for these types in XPath 2.0 (and while matching templates in XSLT 2.0).

(: Test if all invoiceDate elements have been validated as dates. :)
if (order/invoiceDate instance of element(*, xs:date)) then "invoicing 
complete" else " invoicing incomplete"

Warning

instance of is only useful in the presence of schema validation. In addition, it is not the same as castable as. For instance, 10 castable as xs:positiveInteger is always true but 10 instance of xs:positiveInteger is never true because literal integer types are labeled as xs:decimal.

However, the benefit of validation is not simply the ability to test types using instance of but rather from the safety and convenience of knowing that there will be no type error surprises once validation is passed. This can lead to more concise stylesheets.

(: Without validation, you should code like this. :)
for $order in Order return xs:date($order/invoiceDate) 
- xs:date($order/createDate)

(: If you know all date elements have been validated, you can dispense with 
the xs:date constructor.
for $order in Order return $order/invoiceDate - $order/createDate

Discussion

My personal preference is to use XML Schemas as specification documents and not validation tools. Therefore, I tend to write XSLT transformations in ways that are resilient to type errors and use explicit conversions where needed. Stylesheets written in this manner will work in the presence of validation or not.

Once you begin to write stylesheets that depend on validation, you are locked into implementations that perform validation. On the other hand, if your company standards say all XML documents will be schema-validated before processing, then you can simplify your XSLT based on assurances that certain data types will appear in certain situations.

Get XSLT Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.