Capturing
To capture a substring for later use, put parentheses around the
subpattern that matches it. The first pair of parentheses stores its
substring in $1, the second pair in
$2, and so on. You may use as many
parentheses as you like; Perl just keeps defining more numbered
variables for you to represent these captured strings.
Some examples:
/(\d)(\d)/ # Match two digits, capturing them into $1 and $2 /(\d+)/ # Match one or more digits, capturing them all into $1 /(\d)+/ # Match a digit one or more times, capturing the last into $1
Note the difference between the second and third patterns. The second form is usually what you want. The third form does not create multiple variables for multiple digits. Parentheses are numbered when the pattern is compiled, not when it is matched.
Captured strings are often called group references because they refer back to parts of the captured text. Historical pattern-matching engines restricted group references to backreferences only, but Perl allows references to any group, whether back, forward, or the one you’re in the middle of solving.
There are actually two ways to get at these capture groups.
The numbered variables you’ve seen are how you get at
backreferences outside of a pattern, but that doesn’t work inside the
pattern. You have to use backreference notation, so either \1, \2,
\g{1}, \g{2}, \k<some_group>, \k<other_group>, etc.
You can’t use $1 for a group reference within the pattern because that would already have been interpolated as ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access