Classic Perl Character Class Shortcuts
Since the beginning, Perl has provided a number of character class shortcuts. These are listed in Table 5-11. All of them are backslashed alphabetic metasymbols, and, in each case, the uppercase version is the negation of the lowercase version.
These match much more than you might think, because they normally
work on the full Unicode range not on ASCII alone (and for negated
classes, even beyond Unicode). In any case, the normal meanings are a
superset of the old ASCII or locale meanings. For explanations of the
properties and the legacy POSIX forms, see POSIX-Style Character Classes later in this chapter. To keep the old ASCII
meanings, you can always use re "/a"
for that scope, or put a /a or two on
an individual pattern.
Table 5-11. Classic character classes
| Symbol | Meaning | Normal Property | /a Property | /a Enumerated | Legacy [:POSIX:] |
|---|---|---|---|---|---|
\d | Digit | \p{X_POSIX_Digit} | \p{POSIX_Digit} | [0–9] | [:digit:] |
\D | Nondigit | \P{X_POSIX_Digit} | \P{POSIX_Digit} | [^0–9] | [:^digit:] |
\w | Word character | \p{X_POSIX_Word} | \p{POSIX_Word} | [_A–Za–z0–9] | [:word:] |
\W | Non-(word character) | \P{X_POSIX_Word} | \P{POSIX_Word} | [^_A–Za–z0–9] | [:^word:] |
\s | Whitespace | \p{X_Perl_Space} | \p{Perl_Space} | [\t\n\f\r ] | [:space:] [a] |
\S | Nonwhitespace | \P{X_Perl_Space} | \P{Perl_Space} | [^\t\n\f\r ] | [:^space:] |
\h | Horizontal whitespace character | \p{Horiz_Space} | \p{Horiz_Space} | Many | [:blank:] |
\H | Non-(Horizontal whitespace character) | \P{Horiz_Space} | \P{Horiz_Space} | Many | [:^blank:] |
\v | Vertical whitespace character | \p{Vert_Space} | \p{Vert_Space} | Many | — |
\V | Non-(Vertical whitespace ... |
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access