Matching Multiple Lines
Problem
You want to use regular expressions on a string containing more than
one line, but the special characters .
(any
character but newline), ^
(start of string), and
$
(end of string) don’t seem to work for
you. This might happen if you’re reading in multiline records
or the whole file at once.
Solution
Use /m
, /s
, or both as
pattern modifiers. /s
lets .
match newline (normally it doesn’t). If the string had more
than one line in it, then /foo.*bar/s
could match
a "foo"
on one line and a "bar"
on a following line. This doesn’t affect dots in character
classes like [#%.]
, since they are regular periods
anyway.
The /m
modifier lets ^
and
$
match next to a newline.
/^=head[1-7]$/m
would match that pattern not just
at the beginning of the record, but anywhere right after a newline as
well.
Discussion
A common, brute-force approach to parsing documents where newlines
are not significant is to read the file one paragraph at a time (or
sometimes even the entire file as one string) and then extract tokens
one by one. To match across newlines, you need to make
.
match a newline; it ordinarily does not. In
cases where newlines are important and you’ve read more than
one line into a string, you’ll probably prefer to have
^
and $
match beginning- and
end-of-line, not just beginning- and end-of-string.
The difference between /m
and
/s
is important: /m
makes
^
and $
match next to a
newline, while /s
makes .
match newlines. You can even use them together—they’re not ...
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.