Regex Metacharacters, Modes, and Constructs
The metacharacters and metasequences shown here represent most available types of regular expression constructs and their most common syntax. However, syntax and availability vary by implementation.
Character representations
Many implementations provide shortcuts to represent characters that may be difficult to input. (See MRE 115–118.)
- Character shorthands
Most implementations have specific shorthands for the
alert,backspace,escape character,form feed,newline,carriage return,horizontal tab, andvertical tabcharacters. For example,\nis often a shorthand for the newline character, which is usually LF (012 octal), but can sometimes be CR (015 octal), depending on the operating system. Confusingly, many implementations use\bto mean bothbackspaceand word boundary (position between a “word” character and a nonword character). For these implementations,\bmeansbackspacein a character class (a set of possible characters to match in the string), and word boundary elsewhere.- Octal escape:
\num Represents a character corresponding to a two- or three-digit octal number. For example,
\015\012matches an ASCII CR/LF sequence.- Hex and Unicode escapes:
\xnum,\x{num},\unum,\Unum Represent characters corresponding to hexadecimal numbers. Four-digit and larger hex numbers can represent the range of Unicode characters. For example,
\x0D\x0Amatches an ASCII CR/LF sequence.- Control characters:
\cchar Corresponds to ASCII control characters encoded ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access