Backreferences
We mentioned earlier that you can use parentheses to group things for
quantifiers, but you can also use parentheses to remember bits and
pieces of what you matched. A pair of parentheses around a part of a
regular expression causes whatever was matched by that part to be
remembered for later use. It doesn’t change what the part matches, so
/\d+/ and /(\d+)/ will still match as many digits as
possible, but in the latter case they will be remembered in a special
variable to be backreferenced later.
How you refer back to the remembered part of the string depends on
where you want to do it from. Within the same regular expression, you
use a backslash followed by an integer. The integer corresponding to a
given pair of parentheses is determined by counting left parentheses
from the beginning of the pattern, starting with one. So, for example,
to match something similar to an HTML tag like “<B>Bold</B>”, you might use
/<(.*?)>.*?<\/\1>/. This
forces the two parts of the pattern to match the exact same string, such
as the “B” in this example.
Outside the regular expression itself, such as in the replacement
part of a substitution, you use a $
followed by an integer; that is, a normal scalar variable named by the
integer. So if you wanted to swap the first two words of a string, for
example, you could use:
s/(\S+)\s+(\S+)/$2 $1/
The right side of the substitution (between the second and third slashes) is mostly just a funny kind of double-quoted string, which is why you can interpolate ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access