Regular Expression Syntax
It would not be unreasonable to assume that some specification defines how regular expressions are constructed. Unfortunately, there isn’t one. Regular expressions have been incorporated as a feature in a number of tools over the years, with varying degrees of consistency and completeness. The result is a cart-before-the-horse scenario, in which utilities and languages have defined their own flavor of regular expression syntax, each with its own extensions and idiosyncrasies. Formally defining the regular expression syntax came later, as did efforts to make it more consistent. Regular expressions are defined by arranging strings of text, or patterns. Those patterns are composed of two types of characters, literals (plain text or literal text) and metacharacters.
Like the special file globbing characters, regular expression metacharacters take on a special meaning in the context of the tool in which they’re used. There are a few metacharacters that are generally thought of to be among the “extended set” of metacharacters, specifically those introduced into egrep after grep was created.
The egrep command on Linux systems is
simply a wrapper that runs grep -E, informing
grep to use its extended regular expression
capabilities instead of the basic ones. Examples of metacharacters
^ symbol, which means
“the beginning of a line,” and the
$ symbol, which means “the end of a line.” A
complete listing of metacharacters follows in Tables 6-8 through 6-11 ...