Buy this Book
Print Book $49.99 PDF $34.99 Read it Now!
Print Book £35.50
Add to UK Cart
Reprint Licensing

XQuery
XQuery

By Priscilla Walmsley
Book Price: $49.99 USD
£35.50 GBP
PDF Price: $34.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction to XQuery
This chapter provides background on the purpose and capabilities of XQuery. It also gives a quick introduction to the features of XQuery that are covered in more detail later in the book. It is designed to provide a basic familiarity with the most commonly used kinds of expressions, without getting too bogged down in the details.
The use of XML has exploded in recent years. An enormous amount of information is now stored in XML, both in XML databases and in documents on a filesystem. This includes highly structured data, such as sales figures, semistructured data such as product catalogs and yellow pages, and relatively unstructured data such as letters and books. Even more information is passed between systems as transitory XML documents.
All of this data is used for a variety of purposes. For example, sales figures may be useful for compiling financial statements that may be published on the Web, reporting results to the tax authorities, calculating bonuses for salespeople, or creating internal reports for planning. For each of these uses, we are interested in different elements of the data and expect it to be formatted and transformed according to our needs.
XQuery is a query language designed by the W3C to address these needs. It allows you to select the XML data elements of interest, reorganize and possibly transform them, and return the results in a structure of your choosing.
XQuery has a rich set of features that allow many different types of operations on XML data and documents, including:
  • Selecting information based on specific criteria
  • Filtering out unwanted information
  • Searching for information within a document or set of documents
  • Joining data from multiple documents or collections of documents
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is XQuery?
The use of XML has exploded in recent years. An enormous amount of information is now stored in XML, both in XML databases and in documents on a filesystem. This includes highly structured data, such as sales figures, semistructured data such as product catalogs and yellow pages, and relatively unstructured data such as letters and books. Even more information is passed between systems as transitory XML documents.
All of this data is used for a variety of purposes. For example, sales figures may be useful for compiling financial statements that may be published on the Web, reporting results to the tax authorities, calculating bonuses for salespeople, or creating internal reports for planning. For each of these uses, we are interested in different elements of the data and expect it to be formatted and transformed according to our needs.
XQuery is a query language designed by the W3C to address these needs. It allows you to select the XML data elements of interest, reorganize and possibly transform them, and return the results in a structure of your choosing.
XQuery has a rich set of features that allow many different types of operations on XML data and documents, including:
  • Selecting information based on specific criteria
  • Filtering out unwanted information
  • Searching for information within a document or set of documents
  • Joining data from multiple documents or collections of documents
  • Sorting, grouping, and aggregating data
  • Transforming and restructuring XML data into another XML vocabulary or structure
  • Performing arithmetic calculations on numbers and dates
  • Manipulating strings to reformat text
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Easing into XQuery
The rest of this chapter takes you through a set of example queries, each of which builds on the previous one. Three XML documents are used repeatedly as input documents to the query examples throughout the book. They will be used so frequently that it may be worth printing them from the companion web site so that you can view them alongside the examples.
These three examples are quite simplistic, but they are useful for educational purposes because they are easy to learn and remember while looking at query examples. In reality, most XQuery queries will be executed against much more complex documents, and often against multiple documents as a group. However, in order to keep the examples reasonably concise and clear, this book will work with smaller documents that have a representative mix of XML characteristics.
The catalog.xml document is a product catalog containing general information about products (Example 1-1).
Example 1-1. Product catalog input document (catalog.xml)
<catalog>
  <product dept="WMN">
    <number>557</number>
    <name language="en">Fleece Pullover</name>
    <colorChoices>navy black</colorChoices>
  </product>
  <product dept="ACC">
    <number>563</number>
    <name language="en">Floppy Sun Hat</name>
  </product>
  <product dept="ACC">
    <number>443</number>
    <name language="en">Deluxe Travel Bag</name>
  </product>
  <product dept="MEN">
    <number>784</number>
    <name language="en">Cotton Dress Shirt</name>
    <colorChoices>white gray</colorChoices>
    <desc>Our <i>favorite</i> shirt!</desc>
  </product>
</catalog>
The prices.xml document contains prices for the products, based on effective dates (Example 1-2).
Example 1-2. Price information input document (prices.xml)
<prices>
  <priceList effDate="2006-11-15">
    <prod num="557">
      <price currency="USD">29.99</price>
      <discount type="CLR">10.00</discount>
    </prod>
    <prod num="563">
      <price currency="USD">69.99</price>
    </prod>
    <prod num="443">
      <price currency="USD">39.99</price>
      <discount type="CLR">3.99</discount>
    </prod>
  </priceList>
</prices>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Path Expressions
The most straightforward kind of query simply selects elements or attributes from an input document. This type of query is known as a path expression. For example, the path expression:
doc("catalog.xml")/catalog/product
will select all the product elements from the catalog.xml document.
Path expressions are used to traverse an XML tree to select elements and attributes of interest. They are similar to paths used for filenames in many operating systems. They consist of a series of steps, separated by slashes, that traverse the elements and attributes in the XML documents. In this example, there are three steps:
  1. doc("catalog.xml") calls an XQuery function named doc, passing it the name of the file to open
  2. catalog selects the catalog element, the outermost element of the document
  3. product selects all the product children of catalog
The result of the query will be the four product elements, exactly as they appear (with the same attributes and contents) in the input document. Example 1-4 shows the complete result.
Example 1-4. Four product elements selected from the catalog
  <product dept="WMN">
    <number>557</number>
    <name language="en">Fleece Pullover</name>
    <colorChoices>navy black</colorChoices>
  </product>
  <product dept="ACC">
    <number>563</number>
    <name language="en">Floppy Sun Hat</name>
  </product>
  <product dept="ACC">
    <number>443</number>
    <name language="en">Deluxe Travel Bag</name>
  </product>
  <product dept="MEN">
    <number>784</number>
    <name language="en">Cotton Dress Shirt</name>
    <colorChoices>white gray</colorChoices>
    <desc>Our <i>favorite</i> shirt!</desc>
  </product>
Path expressions can also return attributes, using the @ symbol. For example, the path expression:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
FLWORs
The basic structure of many (but not all) queries is the FLWOR expression. FLWOR (pronounced "flower") stands for "for, let, where, order by, return", the keywords used in the expression.
FLWORs, unlike path expressions, allow you to manipulate, transform, and sort your results. Example 1-5 shows a simple FLWOR that returns the names of all products in the ACC department.
Example 1-5. Simple FLWOR
Query
for $prod in doc("catalog.xml")/catalog/product
where $prod/@dept = "ACC"
order by $prod/name
return $prod/name
Results
<name language="en">Deluxe Travel Bag</name>
<name language="en">Floppy Sun Hat</name>
As you can see, the FLWOR is made up of several parts:
for
This clause sets up an iteration through the product nodes, and the rest of the FLWOR is evaluated once for each of the four products. Each time, a variable named $prod is bound to a different product element. Dollar signs are used to indicate variable names in XQuery.
where
This clause selects only products in the ACC department. This has the same effect as a predicate ([@dept = "ACC"]) in a path expression.
order by
This clause sorts the results by product name, something that is not possible with path expressions.
return
This clause indicates that the product element's name children should be returned.
The let clause (the L in FLWOR) is used to set the value of a variable. Unlike a for clause, it does not set up an iteration. Example 1-6 shows a FLWOR that returns the same result as Example 1-5. The second line is a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Adding XML Elements and Attributes
Sometimes you want to reorganize or transform the elements in the input documents into differently named or structured elements. XML constructors can be used to create elements and attributes that appear in the query results.
Suppose you want to wrap the results of your query in a different XML vocabulary, for example XHTML. You can do this using a familiar XML-like syntax. To wrap the name elements in a ul element, for instance, you can use the query shown in Example 1-7. The ul element represents an unordered list in XHTML.
Example 1-7. Wrapping results in a new element
Query
<ul>{
  for $product in doc("catalog.xml")/catalog/product
  where $product/@dept='ACC'
  order by $product/name
  return $product/name
}</ul>
Results
<ul>
  <name language="en">Deluxe Travel Bag</name>
  <name language="en">Floppy Sun Hat</name>
</ul>
This example is the same as Example 1-5, with the addition of the first and last lines. In the query, the ul start tag and end tag, and everything in between, is known as an element constructor. The curly braces around the content of the ul element signify that it is an expression (known as an enclosed expression) that is to be evaluated. In this case, the enclosed expression returns two elements, which become children of ul.
Any content in an element constructor that is not inside curly braces appears in the results as is. For example:
<h1>There are {count(doc("catalog.xml")//product)} products.</h1>
will return the result:
<h1>There are 4 products.</h1>
The content outside the curly braces, namely the strings "There are " and " products." appear literally in the results, as textual content of the ul element.
The element constructor does not need to be the outermost expression in the query. You can include element constructors at various places in your query. For example, if you want to wrap each resulting
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Functions
There are over 100 functions built into XQuery, covering a broad range of functionality. Functions can be used to manipulate strings and dates, perform mathematical calculations, combine sequences of elements, and perform many other useful jobs. You can also define your own functions, either in the query itself, or in an external library.
Both built-in and user-defined functions can be called from almost any place in a query. For instance, Example 1-9 calls the doc function in a for clause, and the data function in an enclosed expression. Chapter 8 explains how to call functions and also describes how to write your own user-defined functions. Appendix A lists all of the built-in functions and explains each of them in detail.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Joins
One of the major benefits of FLWORs is that they can easily join data from multiple sources. For example, suppose you want to join information from your product catalog (catalog.xml) and your order (order.xml). You want a list of all the items in the order, along with their number, name, and quantity.
The name comes from the product catalog, and the quantity comes from the order. The product number appears in both input documents, so it is used to join the two sources. Example 1-11 shows a FLWOR that performs this join.
Example 1-11. Joining multiple input documents
Query
for $item in doc("order.xml")//item
let $name := doc("catalog.xml")//product[number = $item/@num]/name
return <item num="{$item/@num}"
             name="{$name}"
             quan="{$item/@quantity}"/>
Results
<item num="557" name="Fleece Pullover" quan="1"/>
<item num="563" name="Floppy Sun Hat" quan="1"/>
<item num="443" name="Deluxe Travel Bag" quan="2"/>
<item num="784" name="Cotton Dress Shirt" quan="1"/>
<item num="784" name="Cotton Dress Shirt" quan="1"/>
<item num="557" name="Fleece Pullover" quan="1"/>
The for clause sets up an iteration through each item from the order. For each item, the let clause goes to the product catalog and gets the name of the product. It does this by finding the product element whose number child equals the item's num attribute, and selecting its name child. Because the FLWOR iterated six times, the results contain one new item element for each of the six item elements in the order document. Joins are covered in Chapter 6.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Aggregating and Grouping Values
One common use for XQuery is to summarize and group XML data. It is sometimes useful to find the sum, average, or maximum of a sequence of values, grouped by a particular value. For example, suppose you want to know the number of items contained in an order, grouped by department. The query shown in Example 1-12 accomplishes this. It uses a for clause to iterate over the list of distinct departments, a let clause to bind $items to the item elements for a particular department, and the sum function to calculate the totals of the quantity attribute values for the items in $items.
Example 1-12. Aggregating values
Query
for $d in distinct-values(doc("order.xml")//item/@dept)
let $items := doc("order.xml")//item[@dept = $d]
order by $d
return <department name="{$d}" totQuantity="{sum($items/@quantity)}"/>
Results
<department name="ACC" totQuantity="3"/>
<department name="MEN" totQuantity="2"/>
<department name="WMN" totQuantity="2"/>
Chapter 7 covers joining, sorting, grouping, and aggregating values in detail.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: XQuery Foundations
This chapter provides a brief overview of the foundations of XQuery: its design, its place among XML-related standards, and its processing model. It also discusses the underlying data model behind XQuery and the use of types and namespaces in queries.
The XML Query Working Group of the World Wide Web Consortium (W3C) began work on XQuery in 1999. It used as a starting point an XML query language called Quilt, which was itself influenced by two earlier XML query languages: XQL and XML-QL.
The working group set out to design a language that would:
  • Be useful for both highly structured and semistructured documents
  • Be protocol-independent, allowing a query to be evaluated on any system with predictable results
  • Be a declarative language rather than a procedural one
  • Be strongly typed, allowing queries to be "compiled" to identify possible errors and to optimize evaluation of the query
  • Allow querying across collections of documents
  • Use and share as much as possible with appropriate W3C recommendations, such as XML 1.0, Namespaces, XML Schema, and XPath
The XQuery recommendation includes 11 separate documents and over 1,000 printed pages. These documents are listed (with links) at the public XQuery web site at http://www.w3.org/XML/Query. The various recommendation documents are generally designed to be used by implementers of XQuery software, and they vary in readability and accessibility.
XQuery is dependent on or related to a number of other technologies, particularly XPath, XSLT, SQL, and XML Schema. This section explains how XQuery fits in with these technologies.
XPath started out as a language for selecting elements and attributes from an XML document while traversing its hierarchy and filtering out unwanted content. XPath 1.0 is a fairly simple yet useful recommendation that specifies path expressions and a limited set of functions. XPath 2.0 has become much more than that, encompassing a wide variety of expressions and functions, not just path expressions.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Design of the XQuery Language
The XML Query Working Group of the World Wide Web Consortium (W3C) began work on XQuery in 1999. It used as a starting point an XML query language called Quilt, which was itself influenced by two earlier XML query languages: XQL and XML-QL.
The working group set out to design a language that would:
  • Be useful for both highly structured and semistructured documents
  • Be protocol-independent, allowing a query to be evaluated on any system with predictable results
  • Be a declarative language rather than a procedural one
  • Be strongly typed, allowing queries to be "compiled" to identify possible errors and to optimize evaluation of the query
  • Allow querying across collections of documents
  • Use and share as much as possible with appropriate W3C recommendations, such as XML 1.0, Namespaces, XML Schema, and XPath
The XQuery recommendation includes 11 separate documents and over 1,000 printed pages. These documents are listed (with links) at the public XQuery web site at http://www.w3.org/XML/Query. The various recommendation documents are generally designed to be used by implementers of XQuery software, and they vary in readability and accessibility.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XQuery in Context
XQuery is dependent on or related to a number of other technologies, particularly XPath, XSLT, SQL, and XML Schema. This section explains how XQuery fits in with these technologies.
XPath started out as a language for selecting elements and attributes from an XML document while traversing its hierarchy and filtering out unwanted content. XPath 1.0 is a fairly simple yet useful recommendation that specifies path expressions and a limited set of functions. XPath 2.0 has become much more than that, encompassing a wide variety of expressions and functions, not just path expressions.
XQuery 1.0 and XPath 2.0 overlap to a very large degree. They have the same data model and the same set of built-in functions and operators. XPath 2.0 is essentially a subset of XQuery 1.0. XQuery has a number of features that are not included in XPath, such as FLWORs and XML constructors. This is because these features are not relevant to selecting, but instead have to do with structuring or sorting query results.
The two languages are consistent in that any expression that is valid in both languages evaluates to the same value using both languages.
XPath 2.0 was built with the intention that it would be as backward-compatible with XPath 1.0 as possible. Almost all XPath 1.0 expressions are still valid in XPath 2.0, with a few slight differences in the way values are handled. These differences are identified in Chapter 25.
XSLT is a W3C language for transforming XML documents into other XML documents or, indeed, documents of any kind. There is a lot of overlap in the capabilities of XQuery and XSLT. In fact, the XSLT 2.0 standard is based upon XPath 2.0, so it has the same data model and supports all the same built-in functions and operators as XQuery, as well as many of the same expressions.
Some of the differences between XQuery and XSLT are:
  • XSLT implementations are generally optimized for transforming entire documents; they load the entire input document into memory. XQuery is optimized for selecting fragments of data, for example, from a database. It is designed to be scalable and to take advantage of database features such as indexes for optimization.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Processing Queries
A simple example of a processing model for XQuery is shown in Figure 2-1. This section describes the various components of this model.
Figure 2-1: A Basic XQuery processor
Throughout this book, the term input document is used to refer to the XML data that is being queried. The data that is being queried can, in fact, take a number of different forms, for example:
  • Text files that are XML documents
  • Fragments of XML documents that are retrieved from the Web using a URI
  • A collection of XML documents that are associated with a particular URI
  • Data stored in native XML databases
  • Data stored in relational databases that have an XML frontend
  • In-memory XML documents
Some queries use a hardcoded link to the location of the input document(s), using the doc or collection function in the query. Other queries operate on a set of input data that is set by the processor at the time the query is evaluated.
Whether it is physically stored as an XML document or not, an input document must conform to other constraints on XML documents. For example, an element may not have two attributes with the same name, and element and attribute names may not contain special characters other than dashes, underscores, and periods.
An XQuery query could be contained in a text file, embedded in program code or in a query library, generated dynamically by program code, or input by the user on a command line or in a dialog box. Queries can also be composed from multiple files, known as modules
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The XQuery Data Model
XQuery has a data model that is used to define formally all the values used within queries, including those from the input document(s), those in the results, and any intermediate values. The XQuery data model is officially known as the XQuery 1.0 and XPath 2.0 Data Model, or XDM. It is not simply the same as the Infoset (the W3C model for XML documents) because it has to support values that are not complete XML documents, such as sequences of elements (without a single outermost element) and atomic values.
Understanding the XQuery data model is analogous to understanding tables, columns, and rows when learning SQL. It describes the structure of both the inputs and outputs of the query. It is not necessary to become an expert on the intricacies of the data model to write XML queries, but it is essential to understand the basic components:
Node
An XML construct such as an element or attribute
Atomic value
A simple data value with no markup associated with it
Item
A generic term that refers to either a node or an atomic value
Sequence
An ordered list of zero, one, or more items
The relationship among these components is depicted in Figure 2-2.
Figure 2-2: Basic components of the data model
Nodes are used to represent XML constructs such as elements and attributes. Nodes are returned by many expressions, including path expressions and constructors. For example, the path expression
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Types
XQuery is a strongly typed language, meaning that each function and operator expects its arguments or operands to be of a particular type. This section provides some basic information about types that is useful to any query author. More detailed coverage of types in XQuery can be found in Chapter 11.
The XQuery type system is based on that of XML Schema. XML Schema has built-in simple types representing common datatypes such as xs:integer, xs:string, and xs:date. The xs: prefix is used to indicate that these types are defined in the XML Schema specification. Types are assigned to items in the input document during schema validation, which is optional. If no schema is used, the items are untyped.
The type system of XQuery is not as rigid as it may sound, since there are a number of type conversions that happen automatically. Most notably, items that are untyped are automatically cast to the type required by a particular operation. Casting involves converting a value from one type to another following specified rules. For example, the function call:
doc("order.xml")/order/substring(@num, 1, 4)
does not require that the num attribute be declared to be of type xs:string. If it is untyped, it is cast to xs:string. In fact, if you do not plan to use a schema, you can in most cases use XQuery without any regard for types. However, if you do use a schema and the num attribute is declared to be of type xs:integer, you cannot use the preceding substring example without explicitly converting the value of the num attribute to xs:string, as in:
doc("order.xml")/order/substring(xs:string(@num), 1, 4)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Namespaces
Namespaces are used to identify the vocabulary to which XML elements and attributes belong, and to disambiguate names from different vocabularies. This section provides a brief overview of the use of namespaces in XQuery for those who expect to be writing queries with basic use of namespaces. More detailed coverage of namespaces, including a complete explanation of the use of namespaces in XML documents, can be found in Chapter 10.
Many of the names used in a query are namespace-qualified, including those of:
  • Elements and attributes from an input document
  • Elements and attributes in the query results
  • Functions, variables, and types
Example 2-3 shows an input document that contains a namespace declaration, a special attribute whose name starts with xmlns. The prod prefix is mapped to the namespace http://datypic.com/prod. This means that any element or attribute name in the document that is prefixed with prod is in that namespace.
Example 2-3. Input document with namespaces (prod_ns.xml)
<prod:product xmlns:prod="http://datypic.com/prod">
  <prod:number>563</prod:number>
  <prod:name language="en">Floppy Sun Hat</prod:name>
</prod:product>
Example 2-4 shows a query (and its results) that might be used to select the products from the input document.
Example 2-4. Querying with namespaces
Query
declare namespace prod = "http://datypic.com/prod";
for $product in doc("prod_ns.xml")/prod:product
return $product/prod:name
Results
<prod:name xmlns:prod="http://datypic.com/prod"
           language="en">Floppy Sun Hat</prod:name>
The namespace declaration that appears in the first line of the query maps the namespace http://datypic.com/prod to the prefix prod
Additional content appearing in this section has been removed.
Purchase this book now or Chapter 3: Expressions: XQuery Building Blocks
The basic unit of evaluation in the XQuery language is the expression. A query contains expressions that can be made up of a number of sub-expressions, which may themselves be composed from other sub-expressions. This chapter explains the XQuery syntax, and covers the most basic types of expressions that can be used in queries: literals, variables, function calls, and comments.
A query can range in complexity from a single expression such as 2+3, to a complex composite expression like a FLWOR. Within a FLWOR, there may be other expressions, such as $prodDept = "ACC", which is a comparison expression, and doc("catalog.xml")/catalog/product, which is a path expression. Within these expressions, there are further expressions, such as "ACC", which is a literal, and $prodDept, which is a variable reference.
The categories of expressions available are summarized in Table 3-1, along with the number of the chapter that covers them. Every expression evaluates to a sequence, which may be a single atomic value, a single node, the empty sequence, or multiple atomic values and/or nodes.
Table 3-1: Categories of expressions
Category
Description

Return to XQuery