Regular Expression Glossary


Specifies a location in a line or string. For example, the caret or circumflex character (^) signifies the beginning of a line or string of characters, and the dollar sign character ($), the end of a line or string.


Separating a list of regular expressions with a vertical bar (|) character, indicating or. In other words, match any of the regular expressions separated by one or more | characters. In some applications, such as grep or sed that use basic regular expressions (BREs), the | is preceded by a backslash, as in \|. See also basic regular expressions.


American Standard Code for Information Interchange. A 128-character encoding scheme for English (Latin) characters developed in the 1960s. See also Unicode.


See zero-width assertions.


See metacharacter.

atomic group

A grouping that turns off backtracking when a regular expression inside (?>…) fails to match. See also backtracking, groups.


Refers to a previous regular expression captured with parentheses using a reference in the form of \1, \2, and so forth.


Stepping back, character by character, through an attempted match to find a successful match. Used with a greedy match, but not a lazy or possessive match. Catastrophic backtracking occurs when a regex processor makes perhaps thousands of attempts to make a match and consumes a vast amount (read most) of the computing resources available. One way to avoid catastrophic backtracking is with atomic ...

Get Introducing Regular Expressions now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.