Errata

Errata for Data Algorithms

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
PDF	Page xxvi end of first parahraph	The line, "For example, if DNA-Sequencing1 takes 60 hours with three servers, then by "scaling out" the solution might produce the same DNA-Sequencing with 50 similar servers in less than 2 hours." Should say "For example, if DNA-Sequencing1 takes 60 hours with three servers, then by "scaling out" the solution might produce the same DNA-Sequencing with 50 similar servers in less than 4 hours." Reason: 60 hours on 3 servers is 180 server hours. We can hope to achieve the same amount of work done by 50 servers in approximately 4 hours, or 100 servers in 2 hours.	Manoj Agarwal	Nov 23, 2014
Printed	Page 3 2nd bullet point	Page 3 (second bullet point) refers to Java Code Geeks, for Secondary Sorting. I think this should be attributed to "Hadoop: The Definitive Guide by Tom White" as this was widely publicized by him. Even the Java Code Geeks link says this, see Resources section.	Anonymous	May 19, 2016
Printed	Page 4 Example 1-1. DateTemperaturePair class	Page 4: Example 1-1. The DateTemperaturePair class is defined as " public class DateTemperaturePair implements Writable, WritableComparable<DateTemperaturePair> { ........................ } There is no need to implement "Writable"separately as "WritableComparable" already extends it. See WritableComparable Doc at http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/WritableComparable.html for more details	Anonymous	May 19, 2016
Printed	Page 6 Code line immediately above Data Flow Using Plug-in Classes	job.setGroupingComparatorClass(YearMonthGroupingComparator.class) should instead read: job.setGroupingComparatorClass(DateTempuratureGroupingComparator.class) Note that there is no YearMonthGroupingComparator class. The code found in GitHub shows this correctly: https://github.com/mahmoudparsian/data-algorithms-book/blob/master/src/main/java/org/dataalgorithms/chap01/mapreduce/SecondarySortDriver.java	Todd Farmer	May 12, 2016
Printed	Page 7 Figure 1-2. Secondary sorting data flow	The output of partition() shows data for YearMonth value of 2000-11 appearing in both partitions. The DateTemperaturePartitioner class partitions by the YearMonth value, and should result in pairs with the same YearMonth value routed to the same partition.	Todd Farmer	May 12, 2016
PDF	Page 47 Chapter 2, Top-10 List	The described parallelisation approach has a fundamental flaw. Constructing a global top-N from a series of local top-N's might not result in the correct output when members of the global top-N are not present in some (or all) of the local top-N lists. To illustrate with a very simple example of a top-2 calculation based on the following local top-3 lists. top-3 list 1: A, 5 B, 4 C, 3 top-3 list 2: D, 5 E, 4 C, 3 The global nr 1 key is C with a value of 6, but if we'd take the local top-2 lists only, C would be left out entirely. See also this discussion on stackoverflow: http://stackoverflow.com/questions/15613966/parallel-top-ten-algorithm-for-distributed-data	Robbert Zijp	Aug 24, 2014
Printed	Page 260 2nd paragraph	the 1st bullet point "Give that today is foggy, what is the probability that it will be rainy two days from now?" The problem asks S3 to be "Rainy" - but the solution given in the text after the above line - is done with S3 to be "Foggy"	Sumit Pal	Feb 28, 2016
PDF	Page 687 3rd bullet item, starting with 'It does not allow false negative errors'	There is an error in this sentence: 'This means that if x is /not/ in the set, then for sure it will indicate that x is not in the set.' This should be: 'This means that if x is in the set, then for sure it will /not/ indicate that x is not in the set.' The original sentence is also contradicting the previous bullet about false positive errors, which are allowed: 'This means that for some x, which is not in the set, Bloom filter might indicate that x is in the set.' In both the 2nd and the 3rd bullet the situation is described that x is not in the set. - According to the 2nd bullet, a bloom filter might report that x is in the set, - but according to the 3rd bullet the bloom filter in the same case will never report that x is in the set	Robbert Zijp	Aug 24, 2014