Representing Groups of Characters

Sometimes characters fall into convenient groups, such as decimal digits or punctuation characters. Three different kinds of escapes can be used to represent a group of characters: multi-character escapes, category escapes, and block escapes. Like single-character escapes, they all start with a backslash.

Multi-Character Escapes

Multi-character escapes, listed in Table 18-7, represent groups of related characters. They are called multi-character escapes because they allow a choice of multiple characters. However, each escape represents only one character in a matching string. To allow several replacement characters, you should use a quantifier such as +.

Table 18-7. Multi-character escapes

Escape

Meaning

\s

A whitespace character, as defined by XML (space, tab, carriage return, or line feed)

\S

A character that is not a whitespace character

\d

A decimal digit (0 to 9), or a digit in another style, for example, an Indic Arabic digit

\D

A character that is not a decimal digit

\w

A "word" character, that is, any character not in one of the Unicode categories Punctuation, Separators, and Other

\W

A nonword character, that is, any character in one of the Unicode categories Punctuation, Separators, and Other

\i

A character that is allowed as the first character of an XML name, i.e., a letter, an underscore (_), or a colon (:); the "i" stands for "initial"

\I

A character that cannot be the first character ...

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.