Errata

Hadoop Operations

Errata for Hadoop Operations

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed, ePub Page 17
Figure 2-5

In Figure 2-5 it lists "Namecode (Active)" and "Namecode (Standby)" instead of Namenode

Jeffrey Field  Jul 10, 2013 
Printed, PDF Page 19
First sentence

"More precisely, each datanode has a block pool for each namespace."

should be

"More precisely, each datanode has a block pool for each namenode".

Ricardo Colon  Oct 13, 2013 
PDF, Other Digital Version Page 22
1st paragraph

the current line is "As a convenience,
the -moveFromLocal and -moveToLocal commands will copy a file from or to
HDFS, respectively, and then remove the source file (see Example 2-4)."

but moveFromLocal moves from local filesystem to HDFS and not from HDFS and same with moveToLocal command description.

Roshan Pandey  Jul 19, 2016 
Printed Page 24
Last paragraph

I believe "developers writing client applications need to concern themselves with" should be "developers writing client application needn't concern themselves with".

Jeffrey Field  Jul 10, 2013 
PDF Page 38
India

The resource management aspect of the jobtracker is run as a new daemon called the resource manager,; a separate daemon responsible for creating and allocating resources to multiple applications.

Note:
Rather than two semicolons, it is a comma followed by a semicolon. It should be a single semicolon.











Prabhat Kumar  Aug 17, 2014 
PDF Page 61
Table 4-4

The permissions for Datanode directories should be 0755. When I set it to 0700 as mentioned in the book, I got the following error in /var/log/hadoop/hdfs/hadoop-hdfs-datanode-localhost.localdomain.log while starting datanode

2013-12-11 05:07:19,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /data/1/dfs/dn, expected: rwxr-xr-x, while actual: rwx------

I am using Apache Hadoop 1.2.1 on CentOS 6.5. Installed hadoop using hadoop-1.2.1-1.x86_64.rpm

Anonymous  Dec 11, 2013 
Printed Page 63
3rd paragraph

The text says "? the act of cloning memory is a waste of time". Like every modern UNIX derivative, Linux has long recognised this, so, to quote its man page for fork(), "Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent?s page tables, and to create a unique task structure for the child." Nevertheless, following a fork(), the kernel has to commit enough memory to shadow every writable page of the child, just in case it should be needed, so the discussion of the need to tune vm.overcommit_memory remains correct.

Dominic Dunlop  Dec 13, 2012 
Printed Page 73
United States

Text at top of page 73 refers to "forty-eight 48-port 1GbE leaf switches" handling 2,304 hosts in Figure 4-6.

Figure 4-6 labels (left-hand side of diagram) depict 24 leaf switches and 1,152 hosts.

Possibly a simple cut/paste of the labels from Figure 4-5 (previous page).

Alex Moundalexis  Dec 05, 2012 
Printed Page 91
1st paragraph, 8th sentence

In the sentence:

"Loggers output their log events to an appender which is responsible to handling the event in some meaningful way."

- the word "to" should be replaced with "for".

Vijay Thakorlal  Jan 02, 2013 
Printed Page 93
3rd paragraph

Under the paragraph explaining the fs.default.name parameter the last line states:

"[...] you may find it easier to follow documentation and reference material if you don't have to constantly translate port numbers in your head"

The "if" should be replaced with "so".

Vijay Thakorlal  Jan 02, 2013 
Printed Page 94
United States

Text in dfs.data.dir section (line 3) reads "also a comma separate list" -- should read "also a comma separated list"

Alex Moundalexis  Dec 10, 2012 
Printed Page 97
United States

In dfs.datanode.failed.volumes.tolerated section, paragraph 2, line 6 reads: "should be invested immediately" ... should read "should be investigated immediately"

Alex Moundalexis  Dec 10, 2012 
Printed Page 97
4th paragraph

Under the section titled dfs.datanode.failed.volumes.tolerated, the sentence:

"This leaves only the case where all disks fail in relatively rapid succession; an anomalous situation that should be invested immediately"

The word "invested" should be replaced with "investigated".

Vijay Thakorlal  Jan 02, 2013 
Printed Page 101
2nd paragraph, 5th sentence

The sentence:

"The easiest way to accomplish this is to ensure that the uid of the user the namenode process run as is the same on all namenodes"

Should that actually say: "runs as is the same on all namenodes" instead of "run as is the same on all namenodes"?

Vijay Thakorlal  Jan 02, 2013 
Printed Page 106
United States

Bold section title reads "Initialzing ZooKeeper State" instead of "Initializing"

Alex Moundalexis  Dec 10, 2012 
Printed Page 106
United States

Within "ha.zookeeper.quorum" section, it's stated that "a quorum of at least three nodes should be used." Emphasis should be added that the number of ZooKeeper nodes ought to be an ODD number.

Alex Moundalexis  Dec 10, 2012 
Printed Page 110
United States

In first paragraph after log output, text reads "the failover over controller" -- strike extra "over" so text reads "the failover controller"

Alex Moundalexis  Dec 10, 2012 
Printed Page 122
immediately under "Optimization and Tuning" heading

Immediately under the "Optimization and Tuning" section is listed:

mapred.java.child.opts

This should be

mapred.child.java.opts

Hari Sekhon  Nov 25, 2012 
Printed Page 128
end of 3rd paragraph

"... commonly overridden at the job level based on the user's priori knowledge of the data or the work being performed."

extra "i" at the end of prior.

Hari Sekhon  Nov 25, 2012 
Printed Page 135
United States

Line 2, use of "discreet" ought to be changed to "discrete" assuming that the desired meaning is "distinct." This condition exists elsewhere, too; distinct might be a more clear word choice.

Alex Moundalexis  Dec 26, 2012 
Printed Page 138
United States

KDC is defined in paragraph 3 (just below Table 6-1), but the acronym is used previously in paragraph 2 (3rd from last line).

Alex Moundalexis  Dec 26, 2012 
Printed Page 143
United States

3rd sentence in Configuring Hadoop Security, "at the time of this book" -> "at the time of publication"

Alex Moundalexis  Dec 26, 2012 
Printed Page 152
min.user.id trap section 2/3 way down page

Technical inconsistency under "min.user.id" section it states that the default value in CDH is 1000 but then in the trap section immediately below it states:

"This means that the CDH default of 500 will cause all tasks to fail by default"

Hari Sekhon  Nov 26, 2012 
Printed Page 152
paragraph 6

Under the explanation of the min.user.id parameter, it states:

"The default value of this is 1000 in CDH while Apache Hadoop has no default"

Then in the warning / caution directly below this it states:

"A number of Linux distributions, notably CentOS and RHEL, start the uid numbering of user accounts at 500. This means that the CDH default of 500 will cause all tasks to fail by default."

Unless I have misunderstood the explanation - the two paragraphs appear to contradict each other with respect to what is the CDH default for the min.user.id parameter.

Vijay Thakorlal  Jan 02, 2013 
Printed Page 156
United States

2nd paragraph, prior to Example 6-8, line 2:

"the users that allowed to perform" is missing a word, should read "the users that ARE allowed to perform"

Alex Moundalexis  Dec 26, 2012 
Printed Page 157
United States

Last paragraph on page, 3rd sentence: "primary three" -> "three primary"

Alex Moundalexis  Dec 26, 2012 
Printed Page 162
United States

Last paragraph, "strategies that effect durability" reads like a verb is desired, ought to be changed to "strategies that affect durability"

Alex Moundalexis  Dec 26, 2012 
Printed Page 164
United States

First paragraph of Tying It Together, 2nd to last sentence: "there are few things ONE can do"; recommend changing "one" to more informal "you" to fit the tone of the rest of the section.

Alex Moundalexis  Dec 26, 2012 
Printed Page 178
4th bullet point

"Pools can a have a weight that is only considered during fair share allocation"

Grammer/typo around "can a have", no a

Hari Sekhon  Nov 27, 2012 
Printed Page 181
mapred.fairscheduler.assignmultiple

Inconsistency regarding Fair Scheduler:

Page 179 says the default is to assign 1 task per tasktracker heartbeat which contradicts page 181's mapred.fairscheduler.assignmultiple default value: true

It looks like this was indeed false by default, at least in 0.20.x

http://hadoop.apache.org/docs/r0.20.2/fair_scheduler.html

but there is a ticket to set the default to true and possibly remove the optional altogether for newer versions:

https://issues.apache.org/jira/browse/HADOOP-4788

This option appears to no longer exist in latest supporting that this was actually done:

http://hadoop.apache.org/docs/mapreduce/current/fair_scheduler.html

Page 181 should probably be corrected to default false and a note added about this change across versions.

Hari Sekhon  Nov 27, 2012 
Printed Page 186
2nd paragraph

2nd paragraph, 2nd sentence:

"... although capacity is still distributed appropriate across queues."

appropriately is missing the "ly" on the end.

Hari Sekhon  Nov 27, 2012 
Printed Page 186
line 3

"with the most starved queues receive slots first" ; change receive to receiving.

Alex Moundalexis  Dec 28, 2012 
Printed Page 187
Note entitled "On deprecated memory related parameters"

In the note entitled "On deprecated memory related parameters", the first sentence begins with the words:

"Some of parameters to control memory-aware scheduling [..]"

There is a missing "the" after the word "of".

Vijay Thakorlal  Jan 02, 2013 
Printed Page 204
2nd paragraph

In the middle of the 2nd paragraph

"Bear in mind..."

should be

"Bare in mind..."

Hari Sekhon  Nov 26, 2012 
Printed Page 210
last paragraph, 3rd from last line

"We'll use this approach to diagnosing problems encountered": change "diagnosing" to "diagnose"

Alex Moundalexis  Dec 28, 2012 
Printed Page 211
2nd paragraph, 6th line

"sufficient capacity on one of [the] power distribution units": missing "the"

Alex Moundalexis  Dec 28, 2012 
Printed Page 212
last line

"maintaining health clusters" -> change health to "healthy"

Alex Moundalexis  Dec 28, 2012 
Printed Page 212
Last paragraph

In the last sentence / paragraph on this page, the word "health" should actually be "healthy"

"Beyond Hadoop proper, operating system misconfiguration is an equal source of pain in maintaining health clusters."

Vijay Thakorlal  Jan 02, 2013 
Printed Page 213
2nd paragraph 4th sentence

Under the section entitled "Hardware Failures", there is a typo in that mean time to failure is given the incorrect acronym MTTR; this should be MTTF.

Vijay Thakorlal  Jan 02, 2013 
Printed Page 219
last paragraph, 3rd last sentence

3rd last sentence:

"Each change should be evaluated in the context of the likelihood that the given problem would repeat in the future, or if it were truly annomolous."

Spelling mistake, I believe it should be spelled "anomalous"

Hari Sekhon  Nov 27, 2012 
Printed Page 234
example 10-4 .class values

Example 10-4 is about Using GangliaContext, but within the example configuration jvm.class and dfs.class are defined with file.FileContext values. should refer to ganglia.GangliaContext values.

Alex Moundalexis  Dec 28, 2012 
Printed Page 238
1st paragraph

Bold term MericsSink should be MetricsSink.

Alex Moundalexis  Dec 28, 2012 
Printed Page 246
1st bullet under Recommendation

"lower than a acceptable threshold": change "a" to "an"

Alex Moundalexis  Dec 28, 2012 
Printed Page 252
3rd paragraph

I believe with distcp you access httpfs via webhdfs:// protocol, not httpfs://. Working example:

hadoop distcp hdfs://elephant:8020/user/training/elephant/shakespeare webhdfs://monkey:14000/user/training/fred12

Using httpfs:// in the same example yields an exception:

hadoop distcp hdfs://elephant:8020/user/training/elephant/shakespeare httpfs://monkey:14000/user/training/fred12
Copy failed: java.io.IOException: No FileSystem for scheme: httpfs

David Goldsmith  Jun 13, 2013 
Printed Page 253
2nd line on page

use of "discreet" ought to be changed to "discrete" assuming that the desired meaning is "distinct." This condition exists elsewhere, too; distinct might be a more clear word choice.

Alex Moundalexis  Dec 28, 2012 
Printed Page 253
3rd paragraph, line 2

use of "discreet" ought to be changed to "discrete" assuming that the desired meaning is "distinct." This condition exists elsewhere, too; distinct might be a more clear word choice.

Alex Moundalexis  Dec 28, 2012 
Printed Page 254
2nd paragraph, 5th line

"streaming data throughout the data to two cluster" doesn't make sense; based on context of amortization, line should change second "data" to "day" to read "streaming data through the day to two clusters"

Alex Moundalexis  Dec 28, 2012 
Printed Page 254
2nd paragraph, 2nd line

use of "discreet" ought to be changed to "discrete" assuming that the desired meaning is "distinct." This condition exists elsewhere, too; distinct might be a more clear word choice.

Alex Moundalexis  Dec 28, 2012 
ePub Page 591
Location 591 in the Kindle version of the book

Reading of HDFS is described as working by the client fetching only the *first* block id and locations for a file. Just after that (on the same page), the author mentions the full list of block ids and locations for the file without mentioning how one goes from the single first block id/location to the full list.

In reality, for the first read (as well as subsequent reads), multiple block ids and locations are fetched in batch.

Although this is not majorly important, it just jumped out as being not quite correct (and possibly not making much sense due to the missing link between the reference to the first block id and location and the full list of block ids and locations).

Gabriel Reid  Oct 25, 2012 
PDF, ePub, Mobi Page 10672
text

Typos:

"effect a state transition" should be "affect a state transition"

"destablizing" should be "destabilizing"

Anonymous  Sep 17, 2019