Regex Metacharacters, Modes, and Constructs

The metacharacters and metasequences shown here represent most available types of regular expression constructs and their most common syntax. However, syntax and availability vary by implementation.

Character representations

Many implementations provide shortcuts to represent characters that may be difficult to input. (See MRE 115–118.)

Character shorthands

Most implementations have specific shorthands for the alert, backspace, escape character, form feed, newline, carriage return, horizontal tab, and vertical tab characters. For example, \n is often a shorthand for the newline character, which is usually LF (012 octal), but can sometimes be CR (015 octal), depending on the operating system. Confusingly, many implementations use \b to mean both backspace and word boundary (position between a “word” character and a nonword character). For these implementations, \b means backspace in a character class (a set of possible characters to match in the string), and word boundary elsewhere.

Octal escape: \num

Represents a character corresponding to a two- or three-digit octal number. For example, \015\012 matches an ASCII CR/LF sequence.

Hex and Unicode escapes: \xnum, \x{num}, \unum, \Unum

Represent characters corresponding to hexadecimal numbers. Four-digit and larger hex numbers can represent the range of Unicode characters. For example, \x0D\x0A matches an ASCII CR/LF sequence.

Control characters:\cchar

Corresponds to ASCII control characters encoded ...

Get Regular Expression Pocket Reference, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.