Matching Multiple Lines

Problem

You want to use regular expressions on a string containing more than one line, but the special characters . (any character but newline), ^ (start of string), and $ (end of string) don’t seem to work for you. This might happen if you’re reading in multiline records or the whole file at once.

Solution

Use /m , /s, or both as pattern modifiers. /s lets . match newline (normally it doesn’t). If the string had more than one line in it, then /foo.*bar/s could match a "foo" on one line and a "bar" on a following line. This doesn’t affect dots in character classes like [#%.], since they are regular periods anyway.

The /m modifier lets ^ and $ match next to a newline. /^=head[1-7]$/m would match that pattern not just at the beginning of the record, but anywhere right after a newline as well.

Discussion

A common, brute-force approach to parsing documents where newlines are not significant is to read the file one paragraph at a time (or sometimes even the entire file as one string) and then extract tokens one by one. To match across newlines, you need to make . match a newline; it ordinarily does not. In cases where newlines are important and you’ve read more than one line into a string, you’ll probably prefer to have ^ and $ match beginning- and end-of-line, not just beginning- and end-of-string.

The difference between /m and /s is important: /m makes ^ and $ match next to a newline, while /s makes . match newlines. You can even use them together—they’re not ...

Get Perl Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.