Regular Expressions Cookbook

Errata for Regular Expressions Cookbook

Submit your own errata for this product.


The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted Date Corrected
PDF
Page various
various (see description)

The following regular expressions all include the character class [\s\S] and are listed as JavaScript-only, although they in fact work with all regex flavors covered by the book. The intention was to suggest that they should only be used in JavaScript, because more appropriate alternatives are already listed for other regex flavors. Thus, whether what's currently printed is actually wrong is debatable, but to avoid confusion it is better to expand the regex flavor lists referenced below. Some related text changes are also listed. Page 34: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 35: ---------- Printed (3rd paragraph): '.' thus matches any single character except a newline character. Corrected: . thus matches any single character except a newline character. Page 245: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 246 (two fixes): ---------- Printed (1st paragraph): In this case, the pattern .* (or [\S\s]* in the JavaScript version) is used to simply match the entire subject text with no added constraints. Corrected: In this case, the pattern .* (or [\S\s]* in the version that adds JavaScript support) is used to simply match the entire subject text with no added constraints. Printed (2nd paragraph): This regex uses the dot matches line breaks option to allow the dots to match all characters, including line breaks. See Recipe 3.4 for details about how to apply this modifier with your programming language. The JavaScript regex is different, since JavaScript does not have a dot matches line breaks option. See Any character including line breaks on page 35 in Recipe 2.4 for more information. Corrected: The first regex uses the dot matches line breaks option so that it will work correctly when your subject string contains line breaks. See Recipe 3.4 for details about how to apply this modifier with your programming language. JavaScript doesn't have a dot matches line breaks option, so the second regex uses a character class that matches any character. See Any character including line breaks on page 35 in Recipe 2.4 for more information. Page 306: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 309 x 2 (two occurrences, both with the same corrected replacement): ---------- Printed: Regex options: ^ and $ match at line breaks Regex flavor: JavaScript Corrected: Regex options: ^ and $ match at line breaks ("dot matches line breaks" must not be set) Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 429: ---------- Printed: Regex flavors: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 430: ---------- Printed: Regex flavors: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 458: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 460: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 463 x 2 (two occurrences, both with the same corrected replacement): ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 10, 2009 
Printed
Page ix
3rd paragraph, 1st sentence

as published: "... in situations where people with limited with regular expressions experience ..." corrected: "... in situations where people with limited regular expressions experience ..."

Jeff Roberson  Jun 28, 2009  Aug 01, 2009
PDF
Page 28
3rd paragraph, I think?

Solution \a\e\f\n\r\t\v Regex options: None Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby \x07\x1B\f\n\r\t\v Regex options: None Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Perl does not support the \v vertical tab escape sequence. It must be represented in Perl using a hexadecimal (\x0B) or octal (\013) escape sequence instead.

Note from the Author or Editor:
On page 28, in the Solution section, remove Perl from the list of regex flavors for both given solutions. Add a 3rd solution: \a\e\f\n\r\t\0x0B Regex options: None Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby On page 29, append this sentence to the first paragraph (which starts with "The ECMA-262 standard..."): Perl does not support \v, so we have to use a different syntax for the vertical tab in Perl. In this sentence, \v should be formatted as a regular expression, just as \a and \e are in that sentence.

John W. Krahn  Jun 12, 2009  Aug 01, 2009
Printed
Page 45-46
List of Unicode Properites

Some Unicode propties are missing from the list, these may include <\p{Lu}>, <\p{L&}>, <\p{Lm}>, and <\p{Mc}>.

Note from the Author or Editor:
In the "Unicode property or category" list beginning on page 45, the following 3 items should be added: Insert between \p{Ll} and \p{Lt}: \p{Lu} An uppercase letter that has a lowercase variant Insert between \p{Lt} and \p{Lo}: \p{Lm} A special character that is used as a letter Insert between \p{Mn} and \p{Me}: \p{Mc} A character intended to be combined with another character that takes up extra space (vowel signs in many Eastern scripts)

Yao G  Aug 25, 2009 
Printed
Page 65
2nd and 3rd example

On page 65 your 2nd and 3rd examples would incorrectly match input that was not in the form of a hexadecimal number. For example, the input 9g01h would, incorrectly, succeed in making a match. The regex should be \b[a-fA-F0-9]{1,8}h?\b

Note from the Author or Editor:
In the Solution section, change both instances of [a-z0-9] to [a-f0-9] Also, change the second "Hexadecimal number" subheading to "Hexadecimal number with optional suffix" to differentiate it from the first.

Anonymous  Oct 12, 2009 
PDF
Page 65
Floating-point number section

The \b at the start of the regular expression under "floating point number" should be deleted. It prevents the regular expression from matching floating point numbers without an integer part, as is required by the problem statement for this recipe.

Note from the Author or Editor:
Delete \b at the start of the regular expression under "floating point number"

Jan Goyvaerts
Jan Goyvaerts
O'Reilly Author 
Jul 09, 2009  Aug 01, 2009
Printed
Page 67
1st paragraph, 3rd sentence

as published: "<(\d\d){3}> matches a string of two, four or six digits." corrected: "<(\d\d){3}> matches a string of six digits."

Note from the Author or Editor:
In the first paragraph on page 67, change {3} into {1,3} In the second paragraph, do NOT change the first occurrence of {3} at the start of the paragraph. Change the 2nd and 3rd occurrrences of {3} in the second paragraph into {1,3}

Jeff Roberson  Jun 28, 2009  Aug 01, 2009
Printed
Page 76
5th paragraph (first paragraph under "Negative lookaround"), first sentence

As printed: "<(?!regex)>, with an explanation point instead of..." Should be: "<(?!regex)>, with an exclamation point instead of..."

Nick Aldwin  Jul 20, 2009 
Printed
Page 78
2nd paragraph, 1st sentence

as published: "... character class subtra ction to match ..." corrected: "... character class subtraction to match ..."

Jeff Roberson  Jun 28, 2009  Aug 01, 2009
PDF
Page 81
Solution and Discussion

JavaScript does not support conditionals. In the Solution section, remove JavaScript from the first list of regex flavors (but leave it in the second list). Change "Java and Ruby" and "Java or Ruby" to "Java, JavaScript, and Ruby" and "Java, JavaScript, or Ruby". In the Discussion section, remove JavaScript from the first paragraph.

Jan Goyvaerts
Jan Goyvaerts
O'Reilly Author 
Apr 05, 2010 
Printed
Page 96
2nd paragraph

As printed: "This chapter covers seven programming languages. Each recipe has separate solutions for all seven programming languages, and many recipes also have separate discussions for all seven languages." Change to: "This chapter covers eight programming languages. Each recipe has separate solutions for all eight programming languages, and many recipes also have separate discussions for all eight languages."

Yao G.  Aug 31, 2009 
Printed
Page 104
3rd paragraph, last sentence

"Only the closing delimiter needs to be escaped with a backslash." should it be changed to "Only the dollar sign needs to be escaped with a backslash."???

Note from the Author or Editor:
Change the sentence "Only the closing delimiter needs to be escaped with a backslash." into "If the opening and closing delimiters are different, only the closing delimiter needs to be escaped with a backslash if it occurs as a literal within the regular expression."

Yao G  Aug 27, 2009 
Printed
Page 130
1st sentence

"The Regex() class" should be "The Regex class".

Yao G.  Aug 28, 2009 
Printed
Page 132
last sentence before 3.7

"Follow Recipe 3.7 when partial matches are acceptable." Should it be changed to "Recipe 3.5" ?

Note from the Author or Editor:
The last sentence in recipe 3.6 should be changed to: "Follow Recipe 3.5 when partial matches are acceptable."

Yao G.  Sep 01, 2009 
PDF
Page 143
2nd paragraph of "Ruby", 3rd line at the end

"=~ variable" should be replaced with "$~ variable"

Jan Goyvaerts
Jan Goyvaerts
O'Reilly Author 
Oct 10, 2009 
Printed
Page 147
Java Section, last sentence

"Group(n) returns null,..." should be "Group() returns null,...".

Note from the Author or Editor:
Change this at the end of the Java section on page 147: "group(n) returns null, whereas start() and end() both return -1." into this: "group(n) returns null, whereas start(n) and end(n) both return -1."

Yao G.  Aug 28, 2009 
Printed
Page 156
Java section

In the comment: "Here you can process the match stored in regexMacher" should be "... regexMatcher"

Note from the Author or Editor:
Change regexMacher into regexMatcher

Hunter Johnson  Jul 28, 2009 
PDF
Page 160
1st paragraph

In JavaScript, string.match(/regexp/) works identically to /regexp/.exec(string). string.match only differs when provided a regex that uses /g. Printed: This problem does not exist with string.match() (Recipe 3.10) or string.replace() (Recipe 3.14). Corrected: This problem does not exist with string.replace() (Recipe 3.14) or when finding all matches with string.match() (Recipe 3.10).

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 12, 2009  Aug 01, 2009
Printed
Page 168
regex following 2nd paragraph

as published: "\d+(?=(?:.(?!<b>))*</b>)" corrected: "\d+(?=(?:(?!<b>).)*</b>)" The as published version works properly for the given test subject ("1 <b>2</b> 3 4 <b>5 6 7</b>"), but does not handle the case where a number is immediately followed by an opening bold tag. If you remove all the spaces from the test subject string ("1<b>2</b>34<b>567</b>"), the as published regex erroneously matches all the numbers both inside and outside the bold tags. In the corrected regex, the dot must follow the negative lookahead, otherwise it will consume the first char of the opening bold tag.

Jeff Roberson  Jun 28, 2009  Aug 01, 2009
Printed
Page 168
First footnote

Because there are two authors, "...they only end up proving my point that..." should be "...they only end up proving our point that..." or "...they only end up proving the authors' point that...".

Note from the Author or Editor:
Change "my point" into "our point" in the footnote.

Jim.Monty  Jul 04, 2009  Aug 01, 2009
Printed
Page 171
last sentence above code

As printed: "..., you should use the Regex object with full exception handling:" Change to: "..., you should use the Matcher object with full exception handling:"

Yao G.  Aug 30, 2009 
Printed
Page 175
Line 4

As printed: "When searching for an array or regular" Corrected: "When searching for an array of regular"

Yao G.  Aug 31, 2009 
Printed
Page 175
last sentence above the "Perl" section

As printed: "... to preg_replace." Corrected: "... to preg_replace()."

Yao G.  Aug 31, 2009 
Printed
Page 206-207
1st paragraph on 206, in 3 paragraphs on 207

The input string is "I like <b>bold</b> and <i>italic</i> fonts", but in a number of the discussion sections it says: Simply put, you'll get an array with: I like , <b>, bold, </b>, and , <italic>, and italic</italic> fonts. It should read: Simply put, you'll get an array with: I like , <b>, bold, </b>, and , <i>, and italic</i> fonts.

Note from the Author or Editor:
In recipe 3.20, which runs from page 203 to page 207, <italic> and </italic> (incorrect HTML tags) should be replaced with <i> and </i> (correct HTML tags for italic).

Jared Crookston  May 05, 2010 
PDF
Page 210
2nd paragraph?

Perl If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text: $lines = split(m/\r?\n/, $subject) Then, iterate over the $lines array: foreach $line ($lines) { if ($line =~ m/regex pattern/) { # The regex matches $line } else { # The regex does not match $line } } In Perl $lines is a scalar variable that can only hold one value. In the case of the split function above $lines will be assigned the number of the items that split returns and the actual data will be assigned to the @_ array. You need to change $lines to @lines for that to work properly.

Note from the Author or Editor:
On page 210 in the Perl section, the line: $lines = split(m/\r?\n/, $subject) must be changed into: @lines = split(m/\r?\n/, $subject) Similarly, the line: foreach $line ($lines) { must be changed into: foreach $line (@lines) {

John W. Krahn  Jun 12, 2009  Aug 01, 2009
Printed
Page 210
Python code

lines = re.split("\r?\n", subject) reobj = re.compile("regex pattern") for line in lines: if re.search(line): # the regex matches line else: # the regex does not match line This is the corrected version. The object returned from re.compile() should be used to call search() lines = re.split("\r?\n", subject) reobj = re.compile("regex pattern") for line in lines: if reobj.search(line): # the regex matches line else: # the regex does not match line

Note from the Author or Editor:
In the Python section on page 210, this line: if re.search(line): must be changed into: if reobj.search(line):

Tony Cappellini  Jun 14, 2009  Aug 01, 2009
Printed
Page 215
1st paragrah, 1st sentence

As printed: "...the part of the domain name after the dot can only consist of letters." Change to: "...the part of the domain name after the last (rightmost?) dot can only consist of letters."

Note from the Author or Editor:
Change to: and that the part of the domain name after the last dot can only consist of letters.

Yao G.  Nov 29, 2009 
Printed
Page 215
All four regexes on this page

as published: "... [!#$%&'*+/=?`{|}~^-]+ ..." corrected: "... [\w!#$%&'*+/=?`{|}~^-]+ ..." If an email has a username that has a dot in it, the as-published regex will fail to match the part of the username following the dot if that portion has a word character in it (i.e. "\w"). In other words, the '\w' was erroneously dropped from the character class component of the regex which matches the portion of the username which follows a dot. This same error occurs in all four regexes on this page.

Note from the Author or Editor:
In all 4 regular expressions on page 215, the characters [! appear once as a pair. In all 4 regexes [! should be changed into [\w!

Jeff Roberson  Jun 28, 2009  Aug 01, 2009
Printed
Page 236
1st paragraph under "Variations", last line

As printed: "... the date cannot be ..." Change to: "... the time cannot be ..."

Yao G.  Sep 02, 2009 
Printed
Page 239
last paragraph (p.239), also 1st paragraph (p.240)

As printed: "Time, with optional microseconds...". "microseconds" here sounds like a misinterpretation. I guess any number of digits could be added after a decimal dot or comma to represent a fraction of a second in ISO 8601. So, technically it cannot be referred as either "microseconds" or "milliseconds".

Note from the Author or Editor:
Change "microseconds" into "fractional seconds" in two places: at the bottom of page 239 and at the top of page 240.

Yao G.  Sep 02, 2009 
Printed
Page 239
last sentence above the regexes

As printed: "...XML Schema dateTime type:" Change to: "...XML Schema time type:"

Yao G.  Sep 02, 2009 
PDF
Page 244
3rd paragraph; 1st paragraph under "Solution" heading

The final sentence of the Solution section's first paragraph should use "regular expression" instead of "regular expressions". Printed: You can modify the regular expressions to allow any minimum or maximum text length, or allow characters other than A-Z. Corrected: You can modify the regular expression to allow any minimum or maximum text length, or allow characters other than A-Z.

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 09, 2009 
PDF
Page 249
Code listing under the heading "PHP (PCRE)"

The PHP source code example uses ^ as the start-of-string anchor along with \z as the end-of string anchor. Although this works perfectly fine (since the /m modifier is not used), it would be better to use \A as the start of string anchor for consistency with the prior regex listings. Printed: if (preg_match('/^(?>(?>\r\n?|\n)?[^\r\n]*){0,5}\z/', $_POST['subject'])) { Corrected: if (preg_match('/\A(?>(?>\r\n?|\n)?[^\r\n]*){0,5}\z/', $_POST['subject'])) {

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 09, 2009  Aug 01, 2009
PDF
Page 275
Definition list under the heading "Validate the number"

There is a mistake in the description of the Discover card format. However, the included regexes are correct. ---------- Printed: Discover 16 digits, starting with 6011, or 15 digits starting with 5. Corrected: Discover 16 digits, starting with 6011 or 65. ---------- The numbers 6011 and 65 should use a fixed-width font. The number 16 at the beginning of the corrected sentence should not. -- Reported by Vikas Shukla at http://referencedesigner.com/blog/?p=328.

Steven Levithan
Steven Levithan
O'Reilly Author 
Jul 09, 2009  Aug 01, 2009
Printed
Page 339
2nd paragraph, 2nd line

As printed: "... 16 hexadecimal decimal digits." Change to: "... 16 hexadecimal digits."

Yao G.  Oct 01, 2009 
Printed
Page 366
1st paragraph

as published: "you want to extract jan from http://jan@www.regexcookbook.com" According to RFC1738, usernames are not allowed in the http scheme. A better example would be to use the ftp scheme, which does allow the username:password component in a URL.

Note from the Author or Editor:
Change http://jan@www.regexcookbook.com into ftp://jan@www.regexcookbook.com

Jeff Roberson  Jun 28, 2009  Aug 01, 2009
PDF
Page 369
Paragraph before "see also"

Change this: "including those that don't specify the user" to: "including those that don't specify the host"

Jan Goyvaerts
Jan Goyvaerts
O'Reilly Author 
Dec 08, 2009 
PDF
Page 370
Second subheading under Solution

Change this: "Extract the host while validating the URL" into: "Extract the port while validating the URL"

Jan Goyvaerts
Jan Goyvaerts
O'Reilly Author 
Dec 08, 2009 
Printed
Page 371
4th paragraph

As printed: "Since we want to extract the host, we can exclude URLs that donít specify an authority." Change to: "Since we want to extract the port number, we can exclude URLs that donít specify a port number."

Yao G.  Nov 29, 2009 
Printed
Page 376
1st paragraph under "Discussion"

2nd sentence: "The query is delimited from the part of the URL before it with a hash sign." Change to: "The fragment is delimited from the part of the URL before it with a hash sign."

Yao G.  Dec 03, 2009 
PDF
Page 454
Multiple paragraphs

Page 454, first regex: Change: ^(?:[^>"']|"[^"]*"|'[^']*')+?\sclass\s*=\s*("[^"]*"|'[^']*') To: ^(?:[^>"']|"[^"]*"|'[^']*')+?\sclass\s*=\s*(?:"([^"]*)"|'([^']*)') Page 454, second paragraph: Change: This captures the entire class value and its surrounding quote marks to backreference 1. To: This captures the entire class value to backreference 1 or 2, depending on the type of quote marks surrounding the value. Page 454, fourth paragraph: Change: Finally, if both of the previous regexes matched successfully, youíll want to search within backreference 1 of the second regexís matches using the following pattern: To: Finally, if both of the previous regexes matched successfully, youíll want to search within backreferences 1 and 2 of the second regexís matches using the following pattern:

Note from the Author or Editor:
I got this submission backwards--I included the fix in the description, and have the details here. The error is that the previous version of the regex in question included the surrounding quote marks in the backreference, but the followup regex to search within the backreference for a class did not account for the quote marks.

Steven Levithan
Steven Levithan
O'Reilly Author 
Oct 27, 2009