Matching from Where the Last Pattern Left Off

Problem

You want to match again from where the last pattern left off.

This is a useful approach to take when repeatedly extracting data in chunks from a string.

Solution

Use a combination of the /g match modifier, the \G pattern anchor, and the pos function.

Discussion

If you use the /g modifier on a match, the regular expression engine keeps track of its position in the string when it finished matching. The next time you match with /g, the engine starts looking for a match from this remembered position. This lets you use a while loop to extract the information you want from the string.

while (/(\d+)/g) {
    print "Found $1\n";
}

You can also use \G in your pattern to anchor it to the end of the previous match. For example, if you had a number stored in a string with leading blanks, you could change each leading blank into the digit zero this way:

$n = "   49 here";
$n =~ s/\G /0/g;
print $n;

                  00049 here

You can also make good use of \G in a while loop. Here we use \G to parse a comma-separated list of numbers (e.g., "3,4,5,9,120"):

while (/\G,?(\d+)/g) {
    print "Found number $1\n";
}

By default, when your match fails (when we run out of numbers in the examples, for instance) the remembered position is reset to the start. If you don’t want this to happen, perhaps because you want to continue matching from that position but with a different pattern, use the modifier /c with /g:

$_ = "The year 1752 lost 10 days on the 3rd of September"; while (/(\d+)/gc) ...

Get Perl Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.