Imagine you have a table with both old and new strings, such as the following.
The program in Example 1.4 is a filter that changes all occurrences of each element in the first set to the corresponding element in the second set.
When called without filename arguments, the program is a simple
filter. If filenames are supplied on the command line, an in-place
edit writes the changes to the files, with the original versions
safely saved in a file with a ".orig"
extension.
See Section 7.9 for a description. A -v command-line option writes notification of
each change to standard error.
The table of original strings and their replacements is stored below
__END__
in the main program as described in
Section 7.6. Each pair of strings is converted into
carefully escaped substitutions and accumulated into the
$code
variable like the
popgrep2 program in Section 6.10.
A -t
check to test for an interactive run check
tells whether we’re expecting to read from the keyboard if no
arguments are supplied. That way if the user forgets to give an
argument, they aren’t wondering why the program appears to be
hung.
Example 1-4. fixstyle
#!/usr/bin/perl -w # fixstyle - switch first set of <DATA> strings to second set # usage: $0 [-v] [files ...] use strict; my $verbose = (@ARGV && $ARGV[0] eq '-v' && shift); if (@ARGV) { $^I = ".orig"; # preserve old files } else { warn "$0: Reading from stdin\n" if -t STDIN; } my $code = "while (<>) {\n"; # read in config, build up code to eval while (<DATA>) { chomp; my ($in, $out) = split /\s*=>\s*/; next unless $in && $out; $code .= "s{\\Q$in\\E}{$out}g"; $code .= "&& printf STDERR qq($in => $out at \$ARGV line \$.\\n)" if $verbose; $code .= ";\n"; } $code .= "print;\n}\n"; eval "{ $code } 1" || die; __END__ analysed => analyzed built-in => builtin chastized => chastised commandline => command-line de-allocate => deallocate dropin => drop-in hardcode => hard-code meta-data => metadata multicharacter => multi-character multiway => multi-way non-empty => nonempty non-profit => nonprofit non-trappable => nontrappable pre-define => predefine preextend => pre-extend re-compiling => recompiling reenter => re-enter turnkey => turn-key
One caution: This program is fast, but it doesn’t scale if you
need to make hundreds of changes. The larger the
DATA
section, the longer it takes. A few dozen
changes won’t slow it down, and in fact, the version given in
the solution above is faster for that case. But if you run the
program on hundreds of changes, it will bog down.
Example 1.5 is a version that’s slower for few changes but faster when there are many changes.
Example 1-5. fixstyle2
#!/usr/bin/perl -w # fixstyle2 - like fixstyle but faster for many many matches use strict; my $verbose = (@ARGV && $ARGV[0] eq '-v' && shift); my %change = (); while (<DATA>) { chomp; my ($in, $out) = split /\s*=>\s*/; next unless $in && $out; $change{$in} = $out; } if (@ARGV) { $^I = ".orig"; } else { warn "$0: Reading from stdin\n" if -t STDIN; } while (<>) { my $i = 0; s/^(\s+)// && print $1; # emit leading whitespace for (split /(\s+)/, $_, -1) { # preserve trailing whitespace print( ($i++ & 1) ? $_ : ($change{$_} || $_)); } } __END__ analysed => analyzed built-in => builtin chastized => chastised commandline => command-line de-allocate => deallocate dropin => drop-in hardcode => hard-code meta-data => metadata multicharacter => multi-character multiway => multi-way non-empty => nonempty non-profit => nonprofit non-trappable => nontrappable pre-define => predefine preextend => pre-extend re-compiling => recompiling reenter => re-enter turnkey => turn-key
This version breaks each line into chunks of whitespace and words, which isn’t a fast operation. It then uses those words to look up their replacements in a hash, which is much faster than a substitution. So the first part is slower, the second faster. The difference in speed depends on the number of matches.
If we didn’t care about keeping the amount of whitespace separating each word constant, the second version can run as fast as the first even for a few changes. If you know a lot about your input, you can collapse whitespace into single blanks by plugging in this loop:
# very fast, but whitespace collapse while (<>) { for (split) { print $change{$_} || $_, " "; } print "\n"; }
That leaves an extra blank at the end of each line. If that’s a
problem, you could use the technique from Section 16.14 to install an output filter. Place the
following code in front of the while
loop
that’s collapsing whitespace:
my $pid = open(STDOUT, "|-"); die "cannot fork: $!" unless defined $pid; unless ($pid) { # child while (<STDIN>) { s/ $//; print; } exit; }
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.