Regular Expression Syntax

It would not be unreasonable to assume that some specification defines how regular expressions are constructed. Unfortunately, there isn’t one. Regular expressions have been incorporated as a feature in a number of tools over the years, with varying degrees of consistency and completeness. The result is a cart-before-the-horse scenario, in which utilities and languages have defined their own flavor of regular expression syntax, each with its own extensions and idiosyncrasies. Formally defining the regular expression syntax came later, as did efforts to make it more consistent. Regular expressions are defined by arranging strings of text, or patterns. Those patterns are composed of two types of characters, literals (plain text or literal text) and metacharacters.

Like the special file globbing characters, regular expression metacharacters take on a special meaning in the context of the tool in which they’re used. There are a few metacharacters that are generally thought of to be among the “extended set” of metacharacters, specifically those introduced into egrep after grep was created.

The egrep command on Linux systems is simply a wrapper that runs grep -E, informing grep to use its extended regular expression capabilities instead of the basic ones. Examples of metacharacters include the ^ symbol, which means “the beginning of a line,” and the $ symbol, which means “the end of a line.” A complete listing of metacharacters follows in Tables 6-8 through 6-11 ...

Get LPI Linux Certification in a Nutshell, 3rd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.