Quick Reference
Character Representations
\x{nn} |
Two-digit hexadecimal code. \x{20} represents the space character |
\x{nnnn} |
Four-digit hexadecimal code (Unicode): \x{0020} represents the space character |
\N{unicode name} |
Unicode names. \N{Latin small letter a with ogonek} represents ą. The Unicode’s name is case-insensitive, but it matches case-sensitively. Thus, both \N{latin small letter a with ogonek} and \N{Latin Small letter A with ogonek} match ą. |
Character Classes 1: Standard Classes
I call these classes “standard” for lack of a better term. They were part of the first implementations of GREP, and of the three types of class, to this day the standard classes are the easiest to use.
[char] |
A single character or a group of characters |
[^char] |
Exclude single character or a group of characters |
. |
Any character except paragraph break |
\w |
Word character: letters, digits, and underscore |
\W |
Non-word character |
\l |
Lowercase letter |
\L |
Non-lowercase letter |
\u |
Uppercase letter |
\U |
Non-uppercase letter |
\d |
Digit |
\D |
Nondigit |
\h |
Horizontal space: all spaces and tabs |
\H |
Non-horizontal space characters |
\s |
Whitespace character: all spaces, tabs, and returns |
\S |
Non-whitespace character |
\v |
Vertical space: break characters—paragraph break, forced line break, page, column, frame breaks. |
\V |
Whatever is not \v |
Character Classes 2: Posix Expressions
There is much overlap between the Posix class and the standard class. Most Posix expressions listed here can ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access