Character Classes

The [...] construct is used to list a set of characters (a character class) of which one will match. Brackets are often used when capitalization is uncertain in a match:


A dash (-) may be used to indicate a range of characters in a character class:

/[a-zA-Z]/;  # Match any single letter
/[0-9]/;     # Match any single digit

To put a literal dash in the list you must use a backslash before it (\-).

By placing a ^ as the first element in the brackets, you create a negated character class, i.e., it matches any character not in the list. For example:

/[^A-Z]/;    # Matches any character other than an uppercase letter

Some common character classes have their own predefined escape sequences for your programming convenience :




A digit, same as [0-9]


A nondigit, same as [^0-9]


A word character (alphanumeric), same as [a-zA-Z_0-9]


A non-word character, [^a-zA-Z_0-9]


A whitespace character, same as [ \t\n\r\f]


A non-whitespace character, [^ \t\n\r\f]


Match a character (byte)


Match P-named (Unicode) property


Match non-P


Match extended unicode sequence

While Perl implements lc() and uc( ), which you can use for testing the proper case of words or characters, you can do the same with escape sequences :




Lowercase until next character


Uppercase until next character


Lowercase until \E


Uppercase until \E


Disable pattern metacharacters until \E


End case modification

These elements match ...

Get Perl in a Nutshell, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.