Hadoop: The Definitive Guide

Errata for Hadoop: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page 27
Example 2-6 main() function

The instance "job" changed from the example on page 22 where the instance was "conf". There are 6 usages of job that should be highlighted in bold.

John Martin  Jan 05, 2012 
Page 32
1st paragraph

In first sentence after the parentheses, the phrase "between the maps and reduces" should read "between the maps and reducers".

Johnny Tolliver  Nov 21, 2011 
Page 33
2nd line

3rd edition of book:

page 33, line 2:

The sentence of
In "The Default MapReduce Job" on page 227.
should be
on page 228

Bill Zhao  Jun 08, 2012 
Page 34
Example 2-9 code

max_val should be initialized to the smallest negative integer possible instead of 0.

You could have the unlikely case that a year consists entirely of negative temperatures. If this happened, the code as currently written would then erroneously return 0 as the maximum temperature instead of the largest negative temperature from the actual data set.

This is also true for the Python code in example 2-10 on page 36.

David Egts  Sep 24, 2011 
Page 35
1st paragraph

"In this case, the keys are the weather station identifiers" should read "In this case, the keys are the years."

The key-value pair output from the map on page 33 is the year and temperature, not the weather station identifier and temperature.

David Egts  Sep 24, 2011 
Printed, PDF,
Page 55
code snippet in middle of page

The discussion in that and the previous page is about the PositionedReadable interface. Here's the confusing snippet:

All of these methods preserve the current offset in the file and are thread-safe.. In fact, they are just implemented using the
Seekable interface using the following pattern:

long oldPos = getPos();
try {
// read data
} finally {


Now, clearly, that is _not_ a thread-safe pattern. On checking the actual source code, I found that the pattern is in fact wrapped in a synchronized block (as it should be).

So the implementation is thread safe, but not exactly concurrent. And since I/O is many orders of magnitude slower than most any other operation, the window in which this mutex is held is indeed quite long.

It would be better to clear this all up, by either (i) omitting the implementation details altogether, (ii) explicitly wrapping the code snippet in a synchronized block, and/or (iii) noting that while the operation is thread safe, it's not designed for concurrent access.

(By contrast, if this were implement via an nio FileChannel then it would be both thread safe _and_ concurrent--something many a reader knows.)

Note from the Author or Editor:
Thanks for your analysis - I agree entirely. For the third edition I've opted to omit the implementation details (i) and mention that a single instance of FSDataInputStream is not designed for concurrent access (iii), and it's better to create multiple instances.

Babak  Apr 16, 2011 
Page 136
Box: Which Properties Can I Set?

The box points to the book's website for a configuration property reference. I looked but could not find the reference there.

Note from the Author or Editor:
You're right, there is no property reference - I had planned to write one, but have never done so, I'm afraid. The best bet to look at the *-default.xml files, which list all the default settings, with some documentation for each one. You can also view them online - I'm adding the following sentence to the third edition to make this clear (and removing the sentence that is the subject of this erratum):

The default settings documentation files can be found online at URLs of the form<version>/<component>-default.html;
for example the HDFS defaults for release 1.0.0 are at

Anonymous  Oct 13, 2011 
Page 160
6th paragraph

"if set in the tasktracker?s mapred-site.html file" should be "if set in the tasktracker?s mapred-site.xml file"

Note from the Author or Editor:
This error is on page 136 in the print version second edition.

Dave Brondsema  Oct 25, 2010  Apr 21, 2011
3rd paragraph

"Hadoop?s uses a buffer size..." should be without the 's.

Lars George  Dec 16, 2010  Apr 21, 2011
Description of last item

"...the reduce begins, to give the reduces as much..." should be "the reduce begins, to give the reducers as much...", i.e. "reducers" not "reduces".

Lars George  Dec 16, 2010  Apr 21, 2011
Page 214
2nd paragraph

"Combined with SequenceFile.Reader's appendRaw() method" should be "nextRaw()".

Note from the Author or Editor:
The replacement should be: "Combined with a process that creates sequence files with SequenceFile.Writer?s appendRaw() method or SequenceFileAsBinaryOutputFormat"

Davey Yan  Dec 19, 2011 
Page 300
5rd paragraph

In section "Audit Logging", there are two appearances of "org.apache.hadoop.fs.FSNamesystem.audit". They should be "org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit". has the same mistake. I have updated it.

Note from the Author or Editor:
Change "org.apache.hadoop.fs.FSNamesystem.audit" to "org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit" (I could only see one occurrence on the page.)

Jingguo Yao  Nov 28, 2010  Apr 21, 2011
Page 352
Second paragraph of FOREACH...GENERATE section

The third sentence of the second paragraph says, "B's second field is the third field of A ($1) with one added to it." The "$1" should be changed to a "$2".

Keith McDonald  Apr 02, 2011  Apr 21, 2011

> % ls /user/hive/warehouse/record/

should be

> % ls /user/hive/warehouse/records/

miss "s"

lan  Sep 29, 2010  Apr 21, 2011
Page 420
1st paragraph

"We ask the org.apache.hadoop.hbase.HBaseConfigurationn class..." there aren't supposed to be two n's in that class. (Via Doug Meil.)

Tom White
Tom White
Oct 18, 2010  Apr 21, 2011
Page 433
3rd paragraph

"We will see how to do efficient sampling at this end of this section."

Should probably end with " the end of this section."
("the" instead of "this".)

ATZMON HEN TOV  Jan 25, 2013 
Page 473
First paragraph of Resilience and Performance section

Third sentence of first paragraph of "Resilience and Performance" section: "ZooKeeper replies on having low-latency connections..." should be "ZooKeeper relies on having low-latency connections..."

Keith McDonald  Apr 03, 2011  Apr 21, 2011