Regular Expression Glossary

anchor

Specifies a location in a line or string. For example, the caret or circumflex character (^) signifies the beginning of a line or string of characters, and the dollar sign character ($), the end of a line or string.

alternation

Separating a list of regular expressions with a vertical bar (|) character, indicating or. In other words, match any of the regular expressions separated by one or more | characters. In some applications, such as grep or sed that use basic regular expressions (BREs), the | is preceded by a backslash, as in \|. See also basic regular expressions.

ASCII

American Standard Code for Information Interchange. A 128-character encoding scheme for English (Latin) characters developed in the 1960s. See also Unicode.

assertions

See zero-width assertions.

atom

See metacharacter.

atomic group

A grouping that turns off backtracking when a regular expression inside (?>…) fails to match. See also backtracking, groups.

backreference

Refers to a previous regular expression captured with parentheses using a reference in the form of \1, \2, and so forth.

backtracking

Stepping back, character by character, through an attempted match to find a successful match. Used with a greedy match, but not a lazy or possessive match. Catastrophic backtracking occurs when a regex processor makes perhaps thousands of attempts to make a match and consumes a vast amount (read most) of the computing resources available. One way to avoid catastrophic backtracking is with atomic ...

Get Introducing Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.