Chapter 19. Regular Expressions
Regular expressions are patterns that describe strings. They can be used as arguments to four XQuery built-in functions to determine whether a string value matches a particular pattern (matches), to replace parts of string that match a pattern (replace), to tokenize strings based on a delimiter pattern (tokenize), and to split a string into matching and non-matching parts (analyze-string). This chapter explains the regular expression syntax used by XQuery.
The Structure of a Regular Expression
The regular expression syntax of XQuery is based on that of XML Schema, with some additions. Regular expressions, also known as regexes, can be composed of a number of different parts: atoms, quantifiers, and branches.
Atoms
An atom is the most basic unit of a regular expression. It might describe a single character, such as d, or an escape sequence that represents one or more characters, like \s or \p{Lu}. It could also be a character class expression that represents a range or choice of several characters, such as [a-z]. These kinds of atoms are described later in this chapter.
Quantifiers
Atoms may indicate required, optional, or repeating strings. The number of times a matching string may appear is indicated by a quantifier, which appears directly after an atom. For example, to indicate that the letter d must appear one or more times, you can use the expression d+, where the + means “one or more.” The different quantifiers are listed in Table 19-1.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access