Errata

Errata for Regular Expressions Cookbook

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
Printed	Page 206-207 1st paragraph on 206, in 3 paragraphs on 207	The input string is "I like <b>bold</b> and <i>italic</i> fonts", but in a number of the discussion sections it says: Simply put, you'll get an array with: I like , <b>, bold, </b>, and , <italic>, and italic</italic> fonts. It should read: Simply put, you'll get an array with: I like , <b>, bold, </b>, and , <i>, and italic</i> fonts. Note from the Author or Editor: In recipe 3.20, which runs from page 203 to page 207, <italic> and </italic> (incorrect HTML tags) should be replaced with <i> and </i> (correct HTML tags for italic).	Jared Crookston	May 05, 2010
PDF	Page 81 Solution and Discussion	JavaScript does not support conditionals. In the Solution section, remove JavaScript from the first list of regex flavors (but leave it in the second list). Change "Java and Ruby" and "Java or Ruby" to "Java, JavaScript, and Ruby" and "Java, JavaScript, or Ruby". In the Discussion section, remove JavaScript from the first paragraph.	Jan Goyvaerts	Apr 05, 2010
PDF	Page 370 Second subheading under Solution	Change this: "Extract the host while validating the URL" into: "Extract the port while validating the URL"	Jan Goyvaerts	Dec 08, 2009
PDF	Page 369 Paragraph before "see also"	Change this: "including those that don't specify the user" to: "including those that don't specify the host"	Jan Goyvaerts	Dec 08, 2009
Printed	Page 371 4th paragraph	As printed: "Since we want to extract the host, we can exclude URLs that don?t specify an authority." Change to: "Since we want to extract the port number, we can exclude URLs that don?t specify a port number."	Yao G.	Nov 29, 2009
Printed	Page 215 1st paragrah, 1st sentence	As printed: "...the part of the domain name after the dot can only consist of letters." Change to: "...the part of the domain name after the last (rightmost?) dot can only consist of letters." Note from the Author or Editor: Change to: and that the part of the domain name after the last dot can only consist of letters.	Yao G.	Nov 29, 2009
PDF	Page 454 Multiple paragraphs	Page 454, first regex: Change: ^(?:[^>"']\|"[^"]"\|'[^']')+?\sclass\s=\s("[^"]"\|'[^']') To: ^(?:[^>"']\|"[^"]"\|'[^']')+?\sclass\s=\s(?:"([^"])"\|'([^'])') Page 454, second paragraph: Change: This captures the entire class value and its surrounding quote marks to backreference 1. To: This captures the entire class value to backreference 1 or 2, depending on the type of quote marks surrounding the value. Page 454, fourth paragraph: Change: Finally, if both of the previous regexes matched successfully, you?ll want to search within backreference 1 of the second regex?s matches using the following pattern: To: Finally, if both of the previous regexes matched successfully, you?ll want to search within backreferences 1 and 2 of the second regex?s matches using the following pattern: Note from the Author or Editor: I got this submission backwards--I included the fix in the description, and have the details here. The error is that the previous version of the regex in question included the surrounding quote marks in the backreference, but the followup regex to search within the backreference for a class did not account for the quote marks.	Steven Levithan	Oct 27, 2009
Printed	Page 65 2nd and 3rd example	On page 65 your 2nd and 3rd examples would incorrectly match input that was not in the form of a hexadecimal number. For example, the input 9g01h would, incorrectly, succeed in making a match. The regex should be \b[a-fA-F0-9]{1,8}h?\b Note from the Author or Editor: In the Solution section, change both instances of [a-z0-9] to [a-f0-9] Also, change the second "Hexadecimal number" subheading to "Hexadecimal number with optional suffix" to differentiate it from the first.	Anonymous	Oct 12, 2009
PDF	Page 143 2nd paragraph of "Ruby", 3rd line at the end	"=~ variable" should be replaced with "$~ variable"	Jan Goyvaerts	Oct 10, 2009
Printed	Page 339 2nd paragraph, 2nd line	As printed: "... 16 hexadecimal decimal digits." Change to: "... 16 hexadecimal digits."	Yao G.	Oct 01, 2009
Printed	Page 239 last sentence above the regexes	As printed: "...XML Schema dateTime type:" Change to: "...XML Schema time type:"	Yao G.	Sep 02, 2009
Printed	Page 239 last paragraph (p.239), also 1st paragraph (p.240)	As printed: "Time, with optional microseconds...". "microseconds" here sounds like a misinterpretation. I guess any number of digits could be added after a decimal dot or comma to represent a fraction of a second in ISO 8601. So, technically it cannot be referred as either "microseconds" or "milliseconds". Note from the Author or Editor: Change "microseconds" into "fractional seconds" in two places: at the bottom of page 239 and at the top of page 240.	Yao G.	Sep 02, 2009
Printed	Page 236 1st paragraph under "Variations", last line	As printed: "... the date cannot be ..." Change to: "... the time cannot be ..."	Yao G.	Sep 02, 2009
Printed	Page 132 last sentence before 3.7	"Follow Recipe 3.7 when partial matches are acceptable." Should it be changed to "Recipe 3.5" ? Note from the Author or Editor: The last sentence in recipe 3.6 should be changed to: "Follow Recipe 3.5 when partial matches are acceptable."	Yao G.	Sep 01, 2009
Printed	Page 96 2nd paragraph	As printed: "This chapter covers seven programming languages. Each recipe has separate solutions for all seven programming languages, and many recipes also have separate discussions for all seven languages." Change to: "This chapter covers eight programming languages. Each recipe has separate solutions for all eight programming languages, and many recipes also have separate discussions for all eight languages."	Yao G.	Aug 31, 2009
Printed	Page 175 last sentence above the "Perl" section	As printed: "... to preg_replace." Corrected: "... to preg_replace()."	Yao G.	Aug 31, 2009
Printed	Page 175 Line 4	As printed: "When searching for an array or regular" Corrected: "When searching for an array of regular"	Yao G.	Aug 31, 2009
Printed	Page 171 last sentence above code	As printed: "..., you should use the Regex object with full exception handling:" Change to: "..., you should use the Matcher object with full exception handling:"	Yao G.	Aug 30, 2009
Printed	Page 147 Java Section, last sentence	"Group(n) returns null,..." should be "Group() returns null,...". Note from the Author or Editor: Change this at the end of the Java section on page 147: "group(n) returns null, whereas start() and end() both return -1." into this: "group(n) returns null, whereas start(n) and end(n) both return -1."	Yao G.	Aug 28, 2009
Printed	Page 130 1st sentence	"The Regex() class" should be "The Regex class".	Yao G.	Aug 28, 2009
Printed	Page 45-46 List of Unicode Properites	Some Unicode propties are missing from the list, these may include <\p{Lu}>, <\p{L&}>, <\p{Lm}>, and <\p{Mc}>. Note from the Author or Editor: In the "Unicode property or category" list beginning on page 45, the following 3 items should be added: Insert between \p{Ll} and \p{Lt}: \p{Lu} An uppercase letter that has a lowercase variant Insert between \p{Lt} and \p{Lo}: \p{Lm} A special character that is used as a letter Insert between \p{Mn} and \p{Me}: \p{Mc} A character intended to be combined with another character that takes up extra space (vowel signs in many Eastern scripts)	Yao G	Aug 25, 2009
Printed	Page 156 Java section	In the comment: "Here you can process the match stored in regexMacher" should be "... regexMatcher" Note from the Author or Editor: Change regexMacher into regexMatcher	Hunter Johnson	Jul 28, 2009
Printed	Page 76 5th paragraph (first paragraph under "Negative lookaround"), first sentence	As printed: "<(?!regex)>, with an explanation point instead of..." Should be: "<(?!regex)>, with an exclamation point instead of..."	Nick Aldwin	Jul 20, 2009
PDF	Page 160 1st paragraph	In JavaScript, string.match(/regexp/) works identically to /regexp/.exec(string). string.match only differs when provided a regex that uses /g. Printed: This problem does not exist with string.match() (Recipe 3.10) or string.replace() (Recipe 3.14). Corrected: This problem does not exist with string.replace() (Recipe 3.14) or when finding all matches with string.match() (Recipe 3.10).	Steven Levithan	Jul 12, 2009	Aug 01, 2009
PDF	Page various various (see description)	The following regular expressions all include the character class [\s\S] and are listed as JavaScript-only, although they in fact work with all regex flavors covered by the book. The intention was to suggest that they should only be used in JavaScript, because more appropriate alternatives are already listed for other regex flavors. Thus, whether what's currently printed is actually wrong is debatable, but to avoid confusion it is better to expand the regex flavor lists referenced below. Some related text changes are also listed. Page 34: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 35: ---------- Printed (3rd paragraph): '.' thus matches any single character except a newline character. Corrected: . thus matches any single character except a newline character. Page 245: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 246 (two fixes): ---------- Printed (1st paragraph): In this case, the pattern .* (or [\S\s]* in the JavaScript version) is used to simply match the entire subject text with no added constraints. Corrected: In this case, the pattern .* (or [\S\s]* in the version that adds JavaScript support) is used to simply match the entire subject text with no added constraints. Printed (2nd paragraph): This regex uses the dot matches line breaks option to allow the dots to match all characters, including line breaks. See Recipe 3.4 for details about how to apply this modifier with your programming language. The JavaScript regex is different, since JavaScript does not have a dot matches line breaks option. See Any character including line breaks on page 35 in Recipe 2.4 for more information. Corrected: The first regex uses the dot matches line breaks option so that it will work correctly when your subject string contains line breaks. See Recipe 3.4 for details about how to apply this modifier with your programming language. JavaScript doesn't have a dot matches line breaks option, so the second regex uses a character class that matches any character. See Any character including line breaks on page 35 in Recipe 2.4 for more information. Page 306: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 309 x 2 (two occurrences, both with the same corrected replacement): ---------- Printed: Regex options: ^ and $ match at line breaks Regex flavor: JavaScript Corrected: Regex options: ^ and $ match at line breaks ("dot matches line breaks" must not be set) Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 429: ---------- Printed: Regex flavors: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 430: ---------- Printed: Regex flavors: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 458: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 460: ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Page 463 x 2 (two occurrences, both with the same corrected replacement): ---------- Printed: Regex flavor: JavaScript Corrected: Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby	Steven Levithan	Jul 10, 2009
PDF	Page 65 Floating-point number section	The \b at the start of the regular expression under "floating point number" should be deleted. It prevents the regular expression from matching floating point numbers without an integer part, as is required by the problem statement for this recipe. Note from the Author or Editor: Delete \b at the start of the regular expression under "floating point number"	Jan Goyvaerts	Jul 09, 2009	Aug 01, 2009
PDF	Page 244 3rd paragraph; 1st paragraph under "Solution" heading	The final sentence of the Solution section's first paragraph should use "regular expression" instead of "regular expressions". Printed: You can modify the regular expressions to allow any minimum or maximum text length, or allow characters other than A-Z. Corrected: You can modify the regular expression to allow any minimum or maximum text length, or allow characters other than A-Z.	Steven Levithan	Jul 09, 2009
PDF	Page 275 Definition list under the heading "Validate the number"	There is a mistake in the description of the Discover card format. However, the included regexes are correct. ---------- Printed: Discover 16 digits, starting with 6011, or 15 digits starting with 5. Corrected: Discover 16 digits, starting with 6011 or 65. ---------- The numbers 6011 and 65 should use a fixed-width font. The number 16 at the beginning of the corrected sentence should not. -- Reported by Vikas Shukla at http://referencedesigner.com/blog/?p=328.	Steven Levithan	Jul 09, 2009	Aug 01, 2009
PDF	Page 249 Code listing under the heading "PHP (PCRE)"	The PHP source code example uses ^ as the start-of-string anchor along with \z as the end-of string anchor. Although this works perfectly fine (since the /m modifier is not used), it would be better to use \A as the start of string anchor for consistency with the prior regex listings. Printed: if (preg_match('/^(?>(?>\r\n?\|\n)?[^\r\n]){0,5}\z/', $_POST['subject'])) { Corrected: if (preg_match('/\A(?>(?>\r\n?\|\n)?[^\r\n]){0,5}\z/', $_POST['subject'])) {	Steven Levithan	Jul 09, 2009	Aug 01, 2009
Printed	Page 168 First footnote	Because there are two authors, "...they only end up proving my point that..." should be "...they only end up proving our point that..." or "...they only end up proving the authors' point that...". Note from the Author or Editor: Change "my point" into "our point" in the footnote.	Jim.Monty	Jul 04, 2009	Aug 01, 2009
Printed	Page 366 1st paragraph	as published: "you want to extract jan from http://jan@www.regexcookbook.com" According to RFC1738, usernames are not allowed in the http scheme. A better example would be to use the ftp scheme, which does allow the username:password component in a URL. Note from the Author or Editor: Change http://jan@www.regexcookbook.com into ftp://jan@www.regexcookbook.com	Jeff Roberson	Jun 28, 2009	Aug 01, 2009
Printed	Page 215 All four regexes on this page	as published: "... [!#$%&'+/=?`{\|}~^-]+ ..." corrected: "... [\w!#$%&'+/=?`{\|}~^-]+ ..." If an email has a username that has a dot in it, the as-published regex will fail to match the part of the username following the dot if that portion has a word character in it (i.e. "\w"). In other words, the '\w' was erroneously dropped from the character class component of the regex which matches the portion of the username which follows a dot. This same error occurs in all four regexes on this page. Note from the Author or Editor: In all 4 regular expressions on page 215, the characters [! appear once as a pair. In all 4 regexes [! should be changed into [\w!	Jeff Roberson	Jun 28, 2009	Aug 01, 2009
Printed	Page 168 regex following 2nd paragraph	as published: "\d+(?=(?:.(?!<b>))</b>)" corrected: "\d+(?=(?:(?!<b>).)</b>)" The as published version works properly for the given test subject ("1 <b>2</b> 3 4 <b>5 6 7</b>"), but does not handle the case where a number is immediately followed by an opening bold tag. If you remove all the spaces from the test subject string ("1<b>2</b>34<b>567</b>"), the as published regex erroneously matches all the numbers both inside and outside the bold tags. In the corrected regex, the dot must follow the negative lookahead, otherwise it will consume the first char of the opening bold tag.	Jeff Roberson	Jun 28, 2009	Aug 01, 2009
Printed	Page 78 2nd paragraph, 1st sentence	as published: "... character class subtra ction to match ..." corrected: "... character class subtraction to match ..."	Jeff Roberson	Jun 28, 2009	Aug 01, 2009
Printed	Page 67 1st paragraph, 3rd sentence	as published: "<(\d\d){3}> matches a string of two, four or six digits." corrected: "<(\d\d){3}> matches a string of six digits." Note from the Author or Editor: In the first paragraph on page 67, change {3} into {1,3} In the second paragraph, do NOT change the first occurrence of {3} at the start of the paragraph. Change the 2nd and 3rd occurrrences of {3} in the second paragraph into {1,3}	Jeff Roberson	Jun 28, 2009	Aug 01, 2009
Printed	Page ix 3rd paragraph, 1st sentence	as published: "... in situations where people with limited with regular expressions experience ..." corrected: "... in situations where people with limited regular expressions experience ..."	Jeff Roberson	Jun 28, 2009	Aug 01, 2009
Printed	Page 210 Python code	lines = re.split("\r?\n", subject) reobj = re.compile("regex pattern") for line in lines: if re.search(line): # the regex matches line else: # the regex does not match line This is the corrected version. The object returned from re.compile() should be used to call search() lines = re.split("\r?\n", subject) reobj = re.compile("regex pattern") for line in lines: if reobj.search(line): # the regex matches line else: # the regex does not match line Note from the Author or Editor: In the Python section on page 210, this line: if re.search(line): must be changed into: if reobj.search(line):	Tony Cappellini	Jun 14, 2009	Aug 01, 2009
PDF	Page 210 2nd paragraph?	Perl If you have a multiline string, split it into an array of strings first, with each string in the array holding one line of text: $lines = split(m/\r?\n/, $subject) Then, iterate over the $lines array: foreach $line ($lines) { if ($line =~ m/regex pattern/) { # The regex matches $line } else { # The regex does not match $line } } In Perl $lines is a scalar variable that can only hold one value. In the case of the split function above $lines will be assigned the number of the items that split returns and the actual data will be assigned to the @_ array. You need to change $lines to @lines for that to work properly. Note from the Author or Editor: On page 210 in the Perl section, the line: $lines = split(m/\r?\n/, $subject) must be changed into: @lines = split(m/\r?\n/, $subject) Similarly, the line: foreach $line ($lines) { must be changed into: foreach $line (@lines) {	John W. Krahn	Jun 12, 2009	Aug 01, 2009
PDF	Page 28 3rd paragraph, I think?	Solution \a\e\f\n\r\t\v Regex options: None Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby \x07\x1B\f\n\r\t\v Regex options: None Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby Perl does not support the \v vertical tab escape sequence. It must be represented in Perl using a hexadecimal (\x0B) or octal (\013) escape sequence instead. Note from the Author or Editor: On page 28, in the Solution section, remove Perl from the list of regex flavors for both given solutions. Add a 3rd solution: \a\e\f\n\r\t\0x0B Regex options: None Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby On page 29, append this sentence to the first paragraph (which starts with "The ECMA-262 standard..."): Perl does not support \v, so we have to use a different syntax for the vertical tab in Perl. In this sentence, \v should be formatted as a regular expression, just as \a and \e are in that sentence.	John W. Krahn	Jun 12, 2009	Aug 01, 2009