The Match Variables

So far, when we’ve put parentheses into patterns, they’ve been used only for their ability to group parts of a pattern together. But parentheses also trigger the regular expression engine’s memory. The memory holds the part of the string matched by the part of the pattern inside parentheses. If there are more than one pair of parentheses, there will be more than one memory. Each regular expression memory holds part of the original string, not part of the pattern.

Since these variables hold strings, they are scalar variables; in Perl, they have names like $1 and $2. There are as many of these variables as there are pairs of memory parentheses in the pattern. As you’d expect, $4 means the string matched by the fourth set of parentheses. [200]

These match variables are a big part of the power of regular expressions because they let us pull out the parts of a string:

    $_ = "Hello there, neighbor";
    if (/\s(\w+),/) {             # memorize the word between space and comma
      print "the word was $1\n";  # the word was there
    }

Or you could use more than one memory at once:

    $_ = "Hello there, neighbor";
    if (/(\S+) (\S+), (\S+)/) {
      print "words were $1 $2 $3\n";
    }

That tells us that the words were Hello there neighbor. Notice that there’s no comma in the output. Because the comma is outside of the memory parentheses in the pattern, there is no comma in memory two. Using this technique, we can choose what we want in the memories, as well as what we want to leave out.

You could have an empty ...

Get Learning Perl, Fourth Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.