Reading Records with a Pattern Separator
Problem
You want to read in records separated by a pattern, but Perl doesn’t allow its input record separator variable to be a regular expression.
Many problems, most obviously those involving the parsing of complex file formats, become a lot simpler when you are easily able to extract records that might be separated by a number of different strings.
Solution
Read the whole file and use split
:
undef $/; @chunks = split(/pattern/, <FILEHANDLE>);
Discussion
Perl’s record separator must be a fixed string, not a pattern.
(After all, awk has to be better at
something.) To sidestep this limitation,
undefine the input record separator entirely so that the next
line-read operation gets the rest of the file. This is sometimes
called slurp mode, because it slurps in the
whole file as one big string. Then split
that huge
string using the record separating pattern as the first argument.
Here’s an example, where the input stream is a text file that
includes lines consisting of ".Se"
,
".Ch"
, and ".Ss"
, which are
special codes in the troff macro set that this
book was developed under. These lines are the separators, and we want
to find text that falls between them.
# .Ch, .Se and .Ss divide chunks of STDIN { local $/ = undef; @chunks = split(/^\.(Ch|Se|Ss)$/m, <>); } print "I read ", scalar(@chunks), " chunks.\n";
We create a localized version of $/
so its
previous value gets restored after the block finishes. By using
split
with parentheses in the pattern, ...
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.