Alternation
Inside a pattern or subpattern, use the | metacharacter to specify a set of
possibilities, any one of which could match. For instance:
/Gandalf|Saruman|Radagast/
matches Gandalf or Saruman or Radagast. The alternation extends only as far as
the innermost enclosing parentheses (whether capturing or not):
/prob|n|r|l|ate/ # Match prob, n, r, l, or ate /pro(b|n|r|l)ate/ # Match probate, pronate, prorate, or prolate /pro(?:b|n|r|l)ate/ # Match probate, pronate, prorate, or prolate
The second and third forms match the same strings, but the second
form captures the variant character in $1 and the third form does not.
At any given position, the Engine tries to match the first alternative, and then the second, and so on. The relative length of the alternatives does not matter, which means that in this pattern:
/(Sam|Samwise)/
$1 will never be set to Samwise, no matter what string it’s matched
against, because Sam will always match
first. When you have overlapping matches like this, put the longer ones at
the beginning.
But the ordering of the alternatives only matters at a given
position. The outer loop of the Engine does left-to-right matching, so the
following always matches the first Sam:
"'Sam I am,' said Samwise" =~ /(Samwise|Sam)/; # $1 eq "Sam"
To force right-to-left scanning, use greedy quantifiers:
"'Sam I am,' said Samwise" =~ /.*(Samwise|Sam)/; # $1 eq "Samwise"
You can defeat left-to-right (or right-to-left) matching by including any of the various positional assertions we saw ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access