Errata

Mastering Regular Expressions

Errata for Mastering Regular Expressions

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page 67
Second paragraph

In

$var =~ s/~[0-9]+~/~<CODE>$&</CODE>~/g

there are four ~ characters that I can't see the purpose of. In fact, if
digits are not enclosed as in ~dddd~, there will be no match.

Why not just

$var =~ s/[0-9]+/<CODE>$&</CODE>/g

This would also better tie in with the statement: "The replacement string
is <CODE>$&</CODE>" a couple of lines down.

Anonymous   
Printed Page 112
Last paragraph

The line beginning:

tice, which text will to(ur(nament)?)?? actually match:

should be (last ? isn't part of the 9781565922570):

tice, which text will to(ur(nament)?)? actually match:

Anonymous   
Printed Page 155
2nd paragraph

The line:

Usually, an algorithm called "Boyer-Moore" us used.

should be:

Usually, an algorithm called "Boyer-Moore" is used.

Also, the tcl home page is not w/ Sun anymore; it's:

www.scriptics.com

Anonymous   
Printed Page 180
1st paragraph, 1st sentence

The sentence reads:
When it comes down to it, thought and logic take you most, but not necessarily all,
the way to an efficient program.

The word "of" is missing. The correct version is as follows:
When it comes down to it, thought and logic take you most of, but not necessarily
all, the way to an efficient program.

Anonymous   
Printed Page 205
program to parse CSV files

The program does not correctly parse CSV files from Excel when an Excel cell
contains the " character. For example, consider the following Excel
worksheet:

a b c
baz 1'3"high foo
"both"

When written by Excel as a CSV one gets:

a,b,c
foo,"1'3""high",baz
"""both"""

Note that the " in the 1'3" has been doubled inside the outside pair of double
quotes. In general, each " is replaced with "", and each cell is pre- and
post-fixed by a single ".

The following program correctly parses CSV files produced by Microsoft Excel
97:

#---------------------------------------------------------------------

sub parseCSV { # (str) returns array of CSV entries (commas separated fields)
# start with Jeffrey E. F. Friedl, "Mastering Regular Expressions" method
on p 205
# modified to interpret "" as a quoted "
# Correctly parses CSV file produced by Microsoft Excel 97.
my $comma='';
my $str = $_[0];
my $preStr=$str;
my @fields = (); # initialize to null
until ( $str eq '' ) {
my $thisField='';
if( $str =~ m{^"([^",]*)"(|,(.*))$} ) {
$thisField = $1; $str=$3; $comma=$2;
} elsif ( $str =~ m{^([^",]*)(|,(.*))$} ) {
$thisField = $1; $str=$3; $comma=$2;
} elsif ( $str =~ m{^"(.*)$} ) { # there is a leading "
$str=$1;
# get all "" in remainder of this field
while( $str =~ m{^([^"]*")"(.*)$} ) {
$thisField .= $1;
$str=$2;
};
#hopefully we are at a [^,"]*", or " at end of line
if( $str =~ m{([^"]*)"(|,(.*))$} ) {
$thisField .= $1;
$str=$3;
$comma=$2;
} else {
warn "Could not find a " following >$thisField<
in
>$preStr<";
$thisField .= $str;
$str='';
};
} else {
warn "Could not match >$str< in
>$preStr<";
$str = '';
};
push( @fields, $thisField ); # add the just matched field
}
push( @fields, undef) if $comma =~ m/,$/; #account for an empty last field
return @fields;
} # end parseCSV

Anonymous   
Printed Page 243
Under "Byte notations"

It now reads: "...Thus, 12 is a backreference if the expression has at least
12 sets of capturing parentheses, ..."

I think maybe that should say "...has at least 12 sets of capturing
parentheses in the expression up to that point..."

I think it is an important distinction. I ran across this while
looking at our current parsing of backreferences (which is broken).

Just a suggestion.

Anonymous   
Printed Page 302
first sentence (continued from last page)

"..., and the discussion of its touchy optimization needs is valuable."
This should be:
"..., and the discussion of its touchy optimization is valuable."
or:
"..., and the discussion that its touchy optimization needs is valuable."

Anonymous   
Printed Page 309
indented url list

The web pages listed for Jeffrey Friedl's home page are either in error or out
of date. The correct url's are as follows:

http://enterprise.dsi.crc.ca/cgi-bin/j-e/jfriedl.html
http://linear.mv.com/cgi-bin/j-e/jfriedl.html

The uk based one, merlin.soc.staffs.ac.uk, does not work and I have not been
able to find the correct version.

Anonymous