Representing Individual Characters

A single character can be used to represent itself in a regular expression. In this case, it is known as a normal character. For example, the regular expression d matches the letter d, and def matches the string def, as you might expect. Each of the three single characters (d, e, and f) is its own atom, and it can have a quantifier associated with it. For example, the regular expression d+ef matches the strings def, ddef, dddef, etc.

Certain characters, in order to be taken literally, must be escaped because they have another meaning in a regular expression. For example, the asterisk (*) will be treated like a quantifier unless it is escaped. These characters, called metacharacters, must be escaped (except when they are within square brackets): ., \, ?, *, +, |, ^, $, {, }, (, ), [, and ].

These characters are escaped by preceding them with a backslash. This is referred to as a single-character escape because there is only one matching character. For convenience, there are three additional single-character escapes for the whitespace characters tab, line feed, and carriage return. Table 18-4 lists the single-character escapes.

Table 18-4. Single-character escapes

Escape sequence

Character

[a]

\\

\

\|

|

\.

.

\-

-

\^

^

\$ [a]

$

\?

?

\*

*

\+

+

\{

{

\}

}

\(

(

\)

)

\[

[

\]

]

\n

Line feed (#xA)

\r

Carriage return (#xD)

\t

Tab (#x9)

[a]

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.