Chapter 19. Regular Expressions
Regular expressions are patterns that describe strings. They can be used as arguments to four XQuery built-in functions to determine whether a string value matches a particular pattern (matches
), to replace parts of string that match a pattern (replace
), to tokenize strings based on a delimiter pattern (tokenize
), and to split a string into matching and non-matching parts (analyze-string
). This chapter explains the regular expression syntax used by XQuery.
The Structure of a Regular Expression
The regular expression syntax of XQuery is based on that of XML Schema, with some additions. Regular expressions, also known as regexes, can be composed of a number of different parts: atoms, quantifiers, and branches.
Atoms
An atom is the most basic unit of a regular expression. It might describe a single character, such as d
, or an escape sequence that represents one or more characters, like \s
or \p{Lu}
. It could also be a character class expression that represents a range or choice of several characters, such as [a-z]
. These kinds of atoms are described later in this chapter.
Quantifiers
Atoms may indicate required, optional, or repeating strings. The number of times a matching string may appear is indicated by a quantifier, which appears directly after an atom. For example, to indicate that the letter d
must appear one or more times, you can use the expression d+
, where the +
means “one or more.” The different quantifiers are listed in Table 19-1.
Get XQuery, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.