Character Class Expressions
Character class expressions, which are enclosed in square brackets, indicate a choice among several characters. These characters can be listed singly, expressed as a range of characters, or expressed as a combination of the two.
Single Characters and Ranges
To specify a choice of several characters, you can simply list them inside square brackets. For example, [def]
matches d
or e
or f
. To match multiple occurrences of these letters, you can use a quantifier with a character class expression, as in [def]*
, which will match not only defdef
, but eddfefd
as well. The characters listed can also be any of the escapes described earlier in this chapter. The expression [\p{Ll}\d]
matches either a lowercase letter or a digit.
It is also possible to specify a range of characters, by separating the starting and ending characters with a hyphen. For example, [a-z]
matches any letter from a
to z
. The endpoints of the range must be single characters or single character escapes (not a multi-character escapes such as \d
).
You can specify more than one range in the same character class expression, which means that it matches a character in any of the ranges. The expression [a-zA-Z0-9]
matches one character that is either between a
and z
, or between A
and Z
, or a digit from 0
to 9
. Unicode code points are used to determine whether a character is in the range.
Ranges and single characters can be combined in any order. For example, [abc0-9]
matches either a letter a
,
Get XQuery now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.