Chapter 18. Regular Expressions

Regular expressions are patterns that describe strings. They can be used as arguments to three XQuery built-in functions to determine whether a string value matches a particular pattern (matches), to replace parts of string that match a pattern (replace), and to tokenize strings based on a delimiter pattern (tokenize). This chapter explains the regular expression syntax used by XQuery.

The Structure of a Regular Expression

The regular expression syntax of XQuery is based on that of XML Schema, with some additions. Regular expressions, also known as regexes, can be composed of a number of different parts: atoms, quantifiers, and branches.

Atoms

An atom is the most basic unit of a regular expression. It might describe a single character, such as d, or an escape sequence that represents one or more characters, like \s or \p{Lu}. It could also be a character class expression that represents a range or choice of several characters, such as [a-z]. These kinds of atoms are described later in this chapter.

Quantifiers

Atoms may indicate required, optional, or repeating strings. The number of times a matching string may appear is indicated by a quantifier, which appears directly after an atom. For example, to indicate that the letter d must appear one or more times, you can use the expression d+, where the + means "one or more." The different quantifiers are listed in Table 18-1.

Table 18-1. Kinds of quantifiers

Quantifier	Number of occurrences
`none`	1

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

XQuery by Priscilla Walmsley

Chapter 18. Regular Expressions

The Structure of a Regular Expression

Atoms

Quantifiers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly