The Regular Expression Bestiary

Before we dive into the rules for interpreting regular expressions, let’s see what some patterns look like. Most characters in a regular expression simply match themselves. If you string several characters in a row, they must match in order, just as you’d expect. So if you write the pattern match:


you can be sure that the pattern won’t match unless the string contains the substring “Frodo” somewhere. (A substring is just a part of a string.) The match could be anywhere in the string, just as long as those five characters occur somewhere, next to each other and in that order.

Other characters don’t match themselves but “misbehave” in some way. We call these metacharacters. (All metacharacters are naughty in their own right, but some are so bad that they also cause other nearby characters to misbehave as well.)

Here are the miscreants:

\ | ( ) [ { ^ $ * + ? .

Metacharacters are actually very useful and have special meanings inside patterns; we’ll tell you all those meanings as we go along. But we do want to reassure you that you can always match any of these 12 characters literally by putting a backslash in front of each. For example, backslash is itself a metacharacter, so to match a literal backslash, you’d backslash the backslash: \\.

You see, backslash is one of those characters that makes other characters misbehave. It just works out that when you make a misbehaving metacharacter misbehave, it ends up behaving—a double negative, as it were. So ...

Get Programming Perl, 4th Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.