The Regular Expression Bestiary
Before we dive into the rules for interpreting regular expressions, let’s see what some patterns look like. Most characters in a regular expression simply match themselves. If you string several characters in a row, they must match in order, just as you’d expect. So if you write the pattern match:
/Frodo/
you can be sure that the pattern won’t match unless the string
contains the substring “Frodo”
somewhere. (A substring is just
a part of a string.) The match could be anywhere in the string, just as
long as those five characters occur somewhere, next to each other and in
that order.
Other characters don’t match themselves but “misbehave” in some way. We call these metacharacters. (All metacharacters are naughty in their own right, but some are so bad that they also cause other nearby characters to misbehave as well.)
Here are the miscreants:
\ | ( ) [ { ^ $ * + ? .Metacharacters are actually very useful and have special meanings
inside patterns; we’ll tell you all those meanings as we go along. But we
do want to reassure you that you can always match any of these 12
characters literally by putting a backslash in front of each. For example,
backslash is itself a metacharacter, so to match a literal backslash,
you’d backslash the backslash: \\.
You see, backslash is one of those characters that makes other characters misbehave. It just works out that when you make a misbehaving metacharacter misbehave, it ends up behaving—a double negative, as it were. So ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access