Introducing Regular Expressions

Errata for Introducing Regular Expressions

Submit your own errata for this product.


The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted Date Corrected
PDF
Page XI
First URL

The following printed URL does not work: http://orei.ly/intro_regex It should be changed to: http://oreil.ly/intro_regex

Note from the Author or Editor:
The submitter of the errata is correct. This link is broken. It should be: http://oreil.ly/intro_regex

Steven Levithan  Jul 20, 2012  Nov 21, 2012
Safari Books Online
iv
bottom 3rd paragraph

The link in the Safari Books Online version in the preface "What you need to use this book" throws a http 404 page not found error: http://examples.oreilly.com/9781449392680/examples.zip

Note from the Author or Editor:
The correct link is: http://examples.oreilly.com/0636920012337/examples.zip It is already in the book source

Holger Boeken  Aug 15, 2012  Nov 21, 2012
Safari Books Online
?
?

In the section "Quoting Literals" at the end of chapter 1, the sample regular expression doesn't find a space after the parentheses that surround the area code, and it doesn't find a "1" followed by a hyphen or dot at the beginning of the phone number, as is typical with 1-800 numbers. Also, by finding numbers only at the beginning and end of a line, it doesn't work with multiple phone numbers in the lower window of regexpal. Maybe these things are beyond what you want to introduce in this chapter, but would you consider including them in order to make a more robust expression? Possibly change ^(\(\d{3}\)|^\d{3}[.-]?)?\d{3}[.-]?\d{4}$ to (1[.-]?)?(\(\d{3}\)|\d{3}[.-]?)?\s?\d{3}[.-]?\d{4}

Note from the Author or Editor:
This is a very good question by a reader that I know personally. It is a very complex question which I have added to the last chapter as an exercise to the reader. I have updated the text in Atlas to reflect this change.

Karl Hayes  Nov 13, 2012 
ePub
Page TOC
All links in the Table of Contents

When reading the book on a Kobo Glo, if you tap a link in the Table of Contents to go to a particular section, the page number at the bottom changes to reflect the change of page but the rest of the screen goes black. Trying to go back a page has no effect and trying to go forward causes the Kobo to close the book and return to the home screen. If you tap through the pages or use the scrubber bar to get to the section you are looking for the page will display fine. Comparing the toc.ncx to the one in another O'Reilly ePub, Mastering Regular Expressions, the only obvious difference is that the id attributes used in Introducing Regular Expressions begin with underscores and use underscores to separate the words in the section heading as in "_Section_Heading". From a quick search online it seems that the underscore character is not allowed in an id attribute under CSS1 and CSS2. It seems likely that removing the underscores from the id attributes would fix the problem and may improve compatibility with other devices as well. I haven't had the chance to confirm this theory. After I learn a bit more about Regular Expressions I may try altering the ids to see if that fixes the problem.

Note from the Author or Editor:
This is a legitimate technical formatting error, beyond the reach of the author. See: https://github.com/MakerPress/atlas-public-feedback/issues/237 I hope we can find a way to address this issue. Thank you, Mike Fitzgerald

Thomas Gallagher  Mar 21, 2013 
ePub
Page TOC
All links in the Table of Contents

As a follow up to my previous errata I have confirmed that the problem with the TOC was related to the use of the underscore character in the id attribute. With an underscore in the id attribute and corresponding link in the table of contents the Kobo Glo goes to a black screen and then crashes back to the home screen when a link in the TOC is tapped. Changing the anchor tag for the "Who Should Read This Book" section in pr02.html from id="_who_should_read_this_book" to id"whoShouldReadThisBook" and modifying the corresponding link in the toc.ncx file to reflect the change resulted in this link in the TOC working correctly on the Kobo Glo. From further research, it seems that the problem is likely just the leading underscore and that the other ones shouldn't be a problem, though I don't feel like taking the time to test that variation. HTML 4 and XHTML (which is specified in the DOCTYPE declaration for the files in this ePub) do not allow an id to start with an underscore character or a number but rather require that it start with a letter. Many browser seem to work with ids that do not follow this rule and HTML 5 lifts this restriction. The Kobo doesn't seem to handle the breaking of this rule well.

Note from the Author or Editor:
This is a legitimate technical formatting error, beyond the reach of the author. See: https://github.com/MakerPress/atlas-public-feedback/issues/237 I hope we can find a way to address this issue. Thank you, Mike Fitzgerald

Thomas Gallagher  Mar 21, 2013 
PDF
Page 17-20
Various

The descriptions of \w, \W, \d, \D, \s, and \S are misleading, incomplete, or wrong. \w is described as equivalent to [a-zA-Z0-9], and \W as equivalent to [^a-zA-Z0-9]. However, \w always matches at least the underscore, in addition to the described characters. In some regex flavors, \w matches many more Unicode characters. \s is described as equivalent to [ \t\n\r], and \S as equivalent to [^ \t\n\r]. However, \s matches many more whitespace characters in most regex flavors, including in JavaScript (used by RegexPal) and ActionScript (used by RegExr). No mention is made that \d matches more than simply [0-9] in some regex flavors.

Note from the Author or Editor:
Steven's explanations are correct and better than mine. These changes cover several pages and I will need to work with my editor to get a hold of those and make all the rewording changes.

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 20, 2012 
PDF
Page 19-20
Table 2-1

Various issues: - Some whitespace escapes and shorthands are included, while others are not, and are instead listed in Table 2-2. Specially, \f, \r, \n, \s, \S, \t, and \v should all move from Table 2-1 to Table 2-2. Alternatively, all of the escapes and shorthands included in Table 2-2 should be included in Table 2-1, as well. - There is a listing for "\d xxx", with the description "Decimal value for a character". What flavor is this supported in? It's not supported by any of the main flavors covered by the book. - There is a listing for "pass:[<literal>\o</literal> <replaceable>\xxx</replaceable>]", with the description "Octal value for a character". It seems that something went wrong with the editing of this, and DocBook XML is showing up in the printed material. - The listing "\ xxx" (Hexadecimal value for a character) should be "\x xx". - It seems wrong to include \b and \B in this table, since they are zero-length assertions that work nothing at all like the rest of the escapes and shorthands in the table. - Many, many escapes and shorthands are missing, if this is meant to be a reasonably comprehensive cross-flavor reference.

Note from the Author or Editor:
The submitter is correct. These errors need to be cleaned up. It is difficult to submit the changes without seeing the whole text available in the repository. I will work with my editor to get these corrected and submitted to you.

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 20, 2012 
PDF
Page 21
Tip box at end of page

The tip box says that \v does not work in RegExr (implicitly, this means that \v doesn't work in ActionScript 3). Here's the text: "If you try \h, \H, or \V in RegExr, you will see results, but not with \v. Not all whitespace shorthands work with all regex processors." However, this is wrong. RegExr supports \v just fine. You can see this by switching to the Replace tab. Also, the sentence "Not all whitespace shorthands work with all regex processors" is misleading--all of these shorthands either work, don't work, or are used for matching things unrelated to whitespace, depending on the regex flavor. No regex flavor that I'm aware of supports, e.g., just vertical whitespace via \v, or just horizontal whitespace via \h.

Note from the Author or Editor:
Correct. Just delete the tip/note mentioned in the errata.

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 20, 2012  Nov 21, 2012
PDF
Page 22
4th paragraph ("In Figure 2-6...")

The text says: "In Figure 2-6, you see that the dot matches the first character in the target, namely, the letter T." and the caption also indicates such. But the figure shows the whole text as highlighted. Note that the 'global' checkbox is checked. If it is unchecked then RegExr highlights only the 'T".

Note from the Author or Editor:
This is correct. I have made a new screen shot and will send it to you.

my_oreilly  Aug 31, 2012  Nov 21, 2012
Printed
Page 26
United States

I wanted to bring out a misspelling I discovered on page 26. Within the last bullet point recommendation, it is explained that: "All these operations are performed again the file rime.txt." Basing off the previous bullet points list, I believe the correct bullet point should use the word "against" as indicated on page 26. Or the author's intention may have been to indicate that: "All these operations are performed again to the file rime.txt." I'm pretty sure the original intention was: "All these operations are performed against the file rime.txt". I'll attach a digital copy of the page for proof. http://i.imgur.com/upfOIZv.png -Take care!

Note from the Author or Editor:
Yes, "again" should be "against."

Justin Page  Dec 22, 2013 
PDF
Page 30
Figure 3-1 & paragraph 3

Figure 3-1 shows RegExr in Safari with the descriptive paragraph "How...Country" selected by the expression ^How.*Country\.$ and the 'global' and 'multiline' options selected. The following text says that: "global is checked by default when you open RegExr, but you can leave it checked or unchecked for this example." I am using RegExr in Firefox 15 and the desktop version for Windows. I can only get the same result as shown in Figure 3-1 with the 'dotall' and 'multiline' options selected. (The figure does not show 'dotall' selected.)

Note from the Author or Editor:
This is correct and I have corrected the file (ch03.asciidoc).

my_oreilly  Aug 31, 2012  Nov 21, 2012
PDF
Page 33
bottom of the page (output)

using pcregrep for Cygwin I cannot get the same output as shown. 1. I don't get line 1 because the line in rime.txt doesn't end with MARINERE, but instead ends with PARTS. 2. The other line numbers are one off. Instead of 10,38..... I get 9,37... 3. If in insert a newline into the file rime.txt after the MARINERE, then I get the desired output, but this is at odds with the output shown later in the section on sed where the first line is shown to end with "...PARTS."

Note from the Author or Editor:
This has been corrected in ch03.asciidoc.

my_oreilly  Sep 05, 2012  Nov 21, 2012
PDF
Page 36
section "Adding Tags with sed", 1st sed command line

The text says: "The backslashes in front of the quotation marks escape the quotes so that they are seen as literal characters, not part of the command." but the command line shown does not have backslashes in front of the quotation marks. However they are in the file top.sed

Note from the Author or Editor:
This is correct. I have made the correction in ch03.asciidoc in Atlas.

my_oreilly  Sep 05, 2012  Nov 21, 2012
PDF
Page 55
figure 5-2

Figure 5-2 caption says: "Figure 5-2. Negated character class with Regexpal in Opera" but figure shows RegExr not Regexpal

Note from the Author or Editor:
This is correct. I have corrected it in the production file ch05.asciidoc, that is, "Regexpal" now reads "RegExr".

my_oreilly  Sep 05, 2012  Nov 21, 2012
Printed
Page 64-65
2nd sentende of chapter

Chapter "MAtching characters with Octal Numbers It says there in 2nd sentence: "In regex, this is done with three digits, preceded by a slash (\)." However correct is: "In regex, this is done with three digits, preceded by a slash, percentage sign and o for octal (\%0). On the next page the searches that are displayed need correction as well: wrong: \351 correct: \%o351 wrong: \u00e9 correct: \%u00e9 In the last sentence it says: \351 machtes , ... That should be instead : \%o351 machtes , ... See also: http://vimdoc.sourceforge.net/htmldoc/pattern.html#/\%o

Note from the Author or Editor:
[on page 64] In regex, this is done with three digits, preceded by a slash (\), or in _vim_, by a slash, percentage sign, and _o_ for octal (\%o). [on page 65] \351 or \%o351 is the same as: \u00e9 or \%u00e9

Berny  Jun 21, 2014 
Printed
Page 85
United States

It appears that within Chapter 8. Lookarounds' subsection of Negative Lookbehinds, there is an error within your first example for RegExr. Specifically, you indicate a case-insensitivity with '(?1)'. That is, one and not 'i'. I believe the correct regular expression should be: '(?i)(?<!ancyent) marinere' and not '(?1)(?<!ancyent) marinere'. As always, I will provide a copy of the page bellow: http://i.imgur.com/apxDzPQ.jpg

Note from the Author or Editor:
This is correct. The 1 should be an i.

Justin Page  Jan 16, 2014 
ePub
Page 122
After first paragraph

The test gives pcre --help as the command to see the options of pcregrep. It should be: pcregrep --help

Note from the Author or Editor:
This erratum is correct. pcre --help should be pcregrep --help It has been corrected in the source file (ch03.asciii).

lluang  Sep 03, 2012  Nov 21, 2012
ePub
Page 130
2nd text paragraph

Third sentence, second text paragraph states: "The sed command is a little simpler, put Perl is a lot more powerful" It should read "but Perl is a lot more powerful"

Note from the Author or Editor:
This is a type and the erratum has been corrected in source. "put" => "but"

lluang  Sep 03, 2012  Nov 21, 2012