Mastering Regular Expressions by Jeffrey E.F. Friedl Unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. This page was updated June 20, 2002. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification UNCONFIRMED errors and suggestions from readers: [67] Second paragraph; In $var =~ s/~[0-9]+~/~$&<\/CODE>~/g there are four ~ characters that I can't see the purpose of. In fact, if digits are not enclosed as in ~dddd~, there will be no match. Why not just $var =~ s/[0-9]+/$&<\/CODE>/g This would also better tie in with the statement: "The replacement string is $&<\/CODE>" a couple of lines down. (112) Last paragraph; The line beginning: tice, which text will to(ur(nament)?)?? actually match: should be (last ? isn't part of the regex): tice, which text will to(ur(nament)?)? actually match: (155) 2nd paragraph: The line: Usually, an algorithm called "Boyer-Moore" us used. should be: Usually, an algorithm called "Boyer-Moore" is used. Also, the tcl home page is not w/ Sun anymore; it's: www.scriptics.com (180) 1st paragraph, 1st sentence; The sentence reads: When it comes down to it, thought and logic take you most, but not necessarily all, the way to an efficient program. The word "of" is missing. The correct version is as follows: When it comes down to it, thought and logic take you most of, but not necessarily all, the way to an efficient program. [205] program to parse CSV files: The program does not correctly parse CSV files from Excel when an Excel cell contains the " character. For example, consider the following Excel worksheet: a b c baz 1'3"high foo "both" When written by Excel as a CSV one gets: a,b,c foo,"1'3""high",baz """both""" Note that the " in the 1'3" has been doubled inside the outside pair of double quotes. In general, each " is replaced with "", and each cell is pre- and post-fixed by a single ". The following program correctly parses CSV files produced by Microsoft Excel 97: #--------------------------------------------------------------------- sub parseCSV { # (str) returns array of CSV entries (commas separated fields) # start with Jeffrey E. F. Friedl, "Mastering Regular Expressions" method on p 205 # modified to interpret "" as a quoted " # Correctly parses CSV file produced by Microsoft Excel 97. my $comma=''; my $str = $_[0]; my $preStr=$str; my @fields = (); # initialize to null until ( $str eq '' ) { my $thisField=''; if( $str =~ m{^"([^",]*)"(|,(.*))$} ) { $thisField = $1; $str=$3; $comma=$2; } elsif ( $str =~ m{^([^",]*)(|,(.*))$} ) { $thisField = $1; $str=$3; $comma=$2; } elsif ( $str =~ m{^"(.*)$} ) { # there is a leading " $str=$1; # get all "" in remainder of this field while( $str =~ m{^([^"]*")"(.*)$} ) { $thisField .= $1; $str=$2; }; #hopefully we are at a [^,"]*", or " at end of line if( $str =~ m{([^"]*)"(|,(.*))$} ) { $thisField .= $1; $str=$3; $comma=$2; } else { warn "Could not find a \" following >$thisField< in\n>$preStr<"; $thisField .= $str; $str=''; }; } else { warn "Could not match >$str< in\n>$preStr<"; $str = ''; }; push( @fields, $thisField ); # add the just matched field } push( @fields, undef) if $comma =~ m/,$/; #account for an empty last field return @fields; } # end parseCSV {243} Under "Byte notations"; It now reads: "...Thus, \12 is a backreference if the expression has at least 12 sets of capturing parentheses, ..." I think maybe that should say "...has at least 12 sets of capturing parentheses in the expression up to that point..." I think it is an important distinction. I ran across this while looking at our current parsing of backreferences (which is broken). Just a suggestion. (302) first sentence (continued from last page); "..., and the discussion of its touchy optimization needs is valuable." This should be: "..., and the discussion of its touchy optimization is valuable." or: "..., and the discussion that its touchy optimization needs is valuable." (309) indented url list: The web pages listed for Jeffrey Friedl's home page are either in error or out of date. The correct url's are as follows: http://enterprise.dsi.crc.ca/cgi-bin/j-e/jfriedl.html http://linear.mv.com/cgi-bin/j-e/jfriedl.html The uk based one, merlin.soc.staffs.ac.uk, does not work and I have not been able to find the correct version.