Errata

Errata for Hadoop: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
	? "Anatomy of a File Write" when describing HDFS dataflow	First, sorry that I couldn't give the page number as I am reading the book over Safari and I couldn't see a page number. This is just a minor issue that's located under the "3. The Hadoop Distributed Filesystem" section and in the "Data Flow" subsection when describing the "Anatomy of a File Write". And the issue is in the paragraph starting with "As the client writes data (step 3)". In this paragraph it says that "The list of datanodes forms a pipeline, and here we’ll assume the replication level is three, so there are three nodes in the pipeline. ". I think it would be technically more correct if this sentence says "min replication level" instead of "replication level", because during a file write only "min replication number" of nodes form a pipeline and get written synchronously (according to the figure number 4 shows a synchronous write pipeline), the remaining replicas (that is replication level - min replication level) are updated asynchronously after the write succeeds. In fact this is already mentioned in the following paragraphs in this section. So this is just a minor issue to make the sentence less confusing as when it says "replication level" the reader can easily take it as the value of the "dfs.replication" parameter while this sentence really means the value of the "dfs.namenode.replication.min" parameter. Nezih	Nezih Yigitbasi	Jul 18, 2015
Printed	Page 4th paragraph RDBMSs 431	Language error	manju	Aug 08, 2023
Printed	Page 25 The class MaxTemperatureMapper	The class MaxTemperatureMapper should only extends class Mapper but not interface Mapper which doesn't the method map(LongWritable key, Text value, Context context); .	Meng, Qingsong	Jun 03, 2019
Printed	Page 40 1st command	The command for streaming using Ruby files names the full path of the mapper, combiner, and reducer. The command seems to work only when the base names are used. % hadoop jar /usr/hdp/2.2.6.0-2800/hadoop-mapreduce/hadoop-streaming.jar -files ch02-mr-intro/src/main/ruby/max_temperature_map.rb,ch02-mr-intro/src/main/ruby/max_temperature_reduce.rb -input input/ncdc/all -output output -mapper max_temperature_map.rb -combiner max_temperature_reduce.rb -reducer max_temperature_reduce.rb	Jonathan Giddy	May 16, 2016
Printed	Page 74 1st paragraph	In the "Replica Placement" section, the author states: "Hadoop’s default strategy is to place the first replica on the same node as the client [...]. The second replica is placed on a different rack from the first (off-rack), chosen at random. The third replica is placed on the same rack as the second, but on a different node chosen at random." According to the official documentation, this was true for Hadoop version r1.2.1: "For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack". However, since version 2.4.1, the HDFS Architecture documentation reads as follows: "For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack". Considering that the fourth edition covers "Hadoop 2 exclusively" (2.5.1?), It seems like the replica placement strategy depicted by the book is no longer true, unless the cited documentation is wrong. References: http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Replica_Placement:_The_First_Baby_Steps	Juan Sebastian Cadena	Sep 02, 2015
PDF	Page 150 commands after 1st paragraph	The commands for running the ConfigurationPrinter were listed in the book as: % mvn compile % export HADOOP_CLASSPATH=target/classes/ % hadoop ConfigurationPrinter -conf conf/hadoop-localhost.xml \ \| grep yarn.resourcemanager.address= yarn.resourcemanager.address=localhost:8032 But there is no target/ in the root project directory after running mvn compile. The second command should be: % export HADOOP_CLASSPATH=ch06-mr-dev/target/classes/	Colby Adams	Aug 27, 2018
PDF	Page 185 1st paragraph	The last sentence of this paragraph, "If a job fails, JobControl won't run its dependencies.", maybe incorrect. I doubt it should be: "If a job fails, JobControl won't run the jobs depending on it."	sandbox wang	Nov 05, 2015
Printed	Page 249 Table 9-2, 10th Row, 2nd Column, First line.	The description for REDUCE_OUTPUT_RECORDS first line is as follows. The number of reduce output records produced by all the maps in the job. Its technical mistake the line has to be as follows. The number of reduce output records produced by all the reducer's in the job.	C Raja	Nov 09, 2015
Printed, PDF, ePub	Page 268 Joins, First paragraph 3rd line	Crunch is misspelled as Cruc in following line higher-level framework such as Pig, Hive, Cascading, Cruc, or Spark.	Anonymous	Nov 27, 2015
Printed	Page 508 3rd paragraph from the bottom	In this query: select station, year, avg(max_temperature) from ( select station, year, max(temperature) as max_temperature ... group by station, year ) mt group by station, year; The subquery produces a single (station, year, max_temperature) record for each (station, year) grouping ... so the outer select computes the "average" of a single temperature. Or am I missing something?	William	Jan 01, 2016
PDF	Page 554 Java code, records.filter anonymous class	The program in Java compares strings using the != operator, which will not work unless the strings in all the records in the RDD are interned. It should be ! rec[1].equals("9999") instead.	RealSkeptic	Jan 09, 2017