## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# Predicates

Predicates are used in a path expression to filter the results to contain only nodes that meet specific criteria. Using a predicate, you can, for example, select only the elements that have a certain value for an attribute or child element, using a predicate like `[@dept = "ACC"]`. You can also select only elements that have a particular attribute child element, using a predicate such as `[color]`, or elements that occur in a particular position within their parent, using a predicate such as `[3]`.

The syntax of a predicate is simply an expression in square brackets ([ and ]). Table 4-6 shows some examples of predicates.

Example

Meaning

`product[name = "Floppy Sun Hat"]`

All `product` elements that have a `name` child whose value is equal to `Floppy Sun Hat`

`product[number < 500]`

All `product` elements that have a `number` child whose value is less than 500

`product[@dept = "ACC"]`

All `product` elements that have a `dept` attribute whose value is `ACC`

`product[desc]`

All `product` elements that have at least one `desc` child

`product[@dept]`

All `product` elements that have a `dept` attribute

`product[@dept]/number`

All `number` children of `product` elements that have a `dept` attribute

If the expression evaluates to anything other than a number, its effective Boolean value is determined. This means that if it evaluates to the `xs:boolean` value `false`, the number 0 or `NaN`, a zero-length string, or the empty sequence, it is considered false. In most other cases, it is considered true. If the effective Boolean value is true for a particular node, that node is returned. If it is false, the node is not returned.

If the expression evaluates to a number, it is interpreted as the position as described in "Using Positions in Predicates" later in this chapter.

As you can see from the last example, the predicate is not required to appear at the end of the path expression; it can appear at the end of any step.

Note that `product[number]` is different from `product/number`. While both expressions filter out products that have no `number` child, in the former expression, the `product` element is returned. In the latter case, the `number` element is returned.

## Comparisons in Predicates

The examples in the previous section use general comparison operators like `=` and `<`. You can also use the corresponding value comparison operators, such as `eq` and `lt`, but you should be aware of the difference. Value comparison operators only allow a single value, while general comparison operators allow sequences of zero, one, or more values. Therefore, the path expression:

`doc("prices.xml")//priceList[@effDate `eq` '2006-11-15']`

is acceptable, because each `priceList` element can have only one `effDate` attribute. However, if you wanted to find all the `priceList` elements that contain the product 557, you might try the expression:

`doc("prices.xml")//priceList[prod/@num `eq` 557]`

This will raise an error because the expression `prod/@num` returns more than one value per `priceList`. By contrast:

`doc("prices.xml")//priceList[prod/@num `=` 557]`

returns a `priceList` if it has at least one `prod` child whose `num` attribute is equal to 557. It might have other `prod` children whose numbers are not equal to 557.

In both cases, if a particular `priceList` does not have any `prod` children with `num` attributes, it does not return that `priceList`, but it does not raise an error.

Another difference is that value comparison operators treat all untyped data like strings. If we fixed the previous problem with `eq` by returning `prod` nodes instead, as in:

`doc("prices.xml")//priceList/prod[@num `eq` 557]`

it would still raise an error if no schema were present, because it treats the `num` attribute like a string, which can't be compared to a number. The `=` operator, on the other hand, will cast the value of the `num` attribute to `xs:integer` and then compare it to 557, as you would expect.

For these reasons, general comparison operators are easier to use than value comparison operators in predicates when children are untyped or repeating. The down side of general comparison operators is that they also make it less likely that the processor will catch any mistakes you make. In addition, they may be more expensive to evaluate because it's harder for the processor to make use of indexes.

## Using Positions in Predicates

Another use of predicates is to specify the numeric position of an item within the sequence of items currently being processed. These are sometimes called, predictably, positional predicates. For example, if you want the fourth product in the catalog, you can specify:

`doc("catalog.xml")/catalog/product[`4`]`

Any predicate expression that evaluates to an integer will be considered a positional predicate. If you specify a number that is greater than the number of items in the context sequence, it does not raise an error; it simply does not return any nodes. For example:

`doc("catalog.xml")/catalog/product[`99`]`

returns the empty sequence.

### Understanding positional predicates

With positional predicates, it is important to understand that the position is the position within the current sequence of items being processed, not the position of an element relative to its parent's children. Consider the expression:

`doc("catalog.xml")/catalog/product/name[1]`

This expression refers to the first `name` child of each `product`; the step `name[1]` is evaluated once for every `product` element. It does not necessarily mean that the `name` element is the first child of `product`.

It also does not return the first `name` element that appears in the document as a whole. If you wanted just the first `name` element in the document, you could use the expression:

``(`doc("catalog.xml")/catalog/product/name`)`[1]`

because the parentheses change the order of evaluation. First, all the `name` elements are returned; then, the first one of those is selected. Alternatively, you could use:

`doc("catalog.xml")/catalog/`descendant::name`[1]`

because the sequence of descendants is evaluated first, then the predicate is applied. However, this is different from the abbreviated expression:

`doc("catalog.xml")/catalog`//`name[1]`

which, like the first example, returns the first `name` child of each of the products. That's because it's an abbreviation for:

`doc("catalog.xml")/catalog`/descendant-or-self::node( )/`name[1]`

### The position and last functions

The `position` and `last` functions are also useful when writing predicates based on position. The `position` function returns the position of the context item within the context sequence (the current sequence of items being processed). The function takes no arguments and returns an integer representing the position (starting with 1, not 0) of the context item. For example:

`doc("catalog.xml")/catalog/product[position( ) < 3]`

returns the first two `product` children of `catalog`. You could also select the first two children of each `product`, with any name, using:

`doc("catalog.xml")/catalog/product/*[position( ) < 3]`

by using the wildcard *. Note that the predicate `[position( ) = 3]` is equivalent to the predicate `[3]`, so the `position` function is not very useful in this case.

### Warning

When using positional predicates, you should be aware that the `to` keyword does not work as you might expect when used in predicates. If you want the first three products, it may be tempting to use the syntax:

`doc("catalog.xml")/catalog/product[1 to 3]`

However, this will raise an error[*] because the predicate evaluates to multiple numbers instead of a single one. You can, however, use the syntax:

`doc("catalog.xml")/catalog/product[position( ) = (1 to 3)]`

You can also use the subsequence function to limit the results based on position, as in:

`doc("catalog.xml")/catalog/subsequence(product, 1, 3)`

The `last` function returns the number of nodes in the current sequence. It takes no arguments and returns an integer representing the number of items. The `last` function is useful for testing whether an item is the last one in the sequence. For example, `catalog/product[last( )]` returns the last `product` child of `catalog`.

Table 4-7 shows some examples of predicates that use the position of the item. The descriptions assume that there is only one `catalog` element, which is the case in the `catalog.xml` example.

Example

Description

`product[2]`

The second `product` child of `catalog`

`product[position( ) = 2]`

The second `product` child of `catalog`

`product[position( ) > 1]`

All `product` children of `catalog` after the first one

`product[last( )−1]`

The second to last `product` child of `catalog`

`product[last( )]`

The last `product` child of `catalog`

`*[2]`

The second child of `catalog`, regardless of name

`product[3]/*[2]`

The second child of the third `product` child of `catalog`

In XQuery, it's very unusual to use the `position` or `last` functions anywhere except within a predicate. It's not an error, however, as long as the context item is defined. For example, `a/last( )` returns the same number as `count(a)`.

### Positional predicates and reverse axes

Oddly, positional predicates have the opposite meaning when using reverse axes such as `ancestor`, `ancestor-or-self`, `preceding`, or `preceding-sibling`. These axes, like all axes, return nodes in document order. For example, the expression:

`doc("catalog.xml")//i/ancestor::*`

returns the ancestors of the `i` element in document order, namely the `catalog` element, followed by the fourth `product` element, followed by the `desc` element. However, if you use a positional predicate, as in:

`doc("catalog.xml")//i/ancestor::*[1]`

you might expect to get the `catalog` element, but you will actually get the nearest ancestor, the `desc` element. The expression:

`doc("catalog.xml")//i/ancestor::*[last( )]`

will give you the `catalog` element.

## Using Multiple Predicates

Multiple predicates can be chained together to filter items based on more than one constraint. For example:

`doc("catalog.xml")/catalog/product[number < 500][@dept = "ACC"]`

selects only `product` elements with a number child whose value is less than 500 and whose `dept` attribute is equal to `ACC`. This can also be equivalently expressed as:

`doc("catalog.xml")/catalog/product[number < 500 and @dept = "ACC"]`

It is sometimes useful to combine the positional predicates with other predicates, as in:

`doc("catalog.xml")/catalog/product[@dept = "ACC"][2]`

which represents "the second `product` child that has a `dept` attribute whose value is `ACC`," namely the third `product` element. The order of the predicates is significant. If the previous example is changed to:

`doc("catalog.xml")/catalog/product[2][@dept = "ACC"]`

it means something different, namely "the second `product` child, if it has a `dept` attribute whose value is `ACC`." This is because the predicate changes the context, and the context node for the second predicate in this case is the second `product` element.

## More Complex Predicates

So far, the examples of predicates have been simple path expressions, comparison expressions, and numbers. In fact, any expression is allowed in a predicate, making it a very flexible construct. For example, predicates can contain function calls, as in:

`doc("catalog.xml")/catalog/product[contains(@dept, "A")]`

which returns all product children whose `dept` attribute contains the letter A. They can contain conditional expressions, as in:

```doc("catalog.xml")/catalog/product[if (\$descFilter)
then desc else true( )]```

which filters `product` elements based on their `desc` child only if the variable `\$descFilter` is true. They can also contain expressions that combine sequences, as in:

`doc("catalog.xml")/catalog/product[* except number]`

which returns all `product` children that have at least one child other than `number`. General comparisons with multiple values can be used, as in:

`doc("catalog.xml")/catalog/product[@dept = ("ACC", "WMN", "MEN")]`

which returns products whose `dept` attribute value is any of those three values. This is similar to a SQL "in" clause.

To retrieve every third `product` child of `catalog`, you could use the expression:

`doc("catalog.xml")/catalog/product[position( ) mod 3 = 0]`

because it selects all the products whose position is divisible by 3.

Predicates can even contain path expressions that themselves have predicates. For example:

`doc("catalog.xml")/catalog/product[*[3][self::colorChoices]]`

can be used to find all `product` elements whose third child element is `colorChoices`. The `*[3][self::colorChoices]` is part of a separate path expression that is itself within a predicate. `*[3]` selects the third child element of `product`, and `[self::colorChoices]` is a way of testing the name of the current context element.

Predicates are not limited to use with path expressions. They can be used with any sequence. For example:

`(1 to 100)[. mod  5 = 0]`

can be used to return the integers from 1 to 100 that are divisible by 5. Another example is:

` (@price, 0.0)[1]`

which selects the `price` attribute if it exists, or the decimal value `0.0` otherwise.

[*] Although several implementations erroneously support this construct.

## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required