Errata

Errata for Data Science for Business

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
PDF, ePub	Page Praise p.1 3rd quote	"principals" should be "principles"	Tom Fawcett	Aug 09, 2013	Dec 19, 2013
Other Digital Version	Table 10-4 + index	"Dizzie" Gillespie should be Dizzy. Note from the Author or Editor: Fixed.	Steven Maude	Jan 05, 2014
Printed	Page 51 1st equation	The equation entropy = -p1 log(p1) - p2 log(p2) - ... Should be entropy = -p1 log2(p1) - p2 log2(p2) - ... There are missing subscript for log functions. Note from the Author or Editor: The base of the log function doesn't really matter. Entropy is only used to compare variables, and changing the base only scales the entropy values by a constant factor, which doesn't change their ordering. And we say: "(and for the technically minded, the logarithm is generally taken as base 2)". But since we have graphs showing actual values, I'll add subscripts to make the equation more precise.	Chia-Ming Yu	May 03, 2016
PDF	Page 54 1st paragraph, 2nd line of the equation	When calculating the information gain (IG), the probabilities of the "star" and "dot" instances are written as 0.43 and 0.57, respectively. It should be 0.47 for "star" instances and 0.53 for "dot" instances, as correctly indicated on Figure 3-4. As a result, the IG will be IG = 0.99 - [0.47 x 0.39 + 0.53 x 0.79] = 0.39, instead of 0.37. Note from the Author or Editor: Reader is correct. Correction is adopted. Changed by TF.	Adler Santos	Oct 25, 2015
, Printed, PDF, ePub, Mobi, , Other Digital Version	Page 71 Code example for classifiaction by IF-THEN rules	The last rule should say: IF (Balance >= 50k) and (Age >= 45) THEN Class = No Write-Off As it stands, the same condition repeats in line 2 and 3, which is incorrect.	Gregor Hohpe	Sep 20, 2013	Dec 19, 2013
	75 5th in safari books online; 3rd paragraph in physical book (page 75)	Original: "The results are in Figure 3-17, with a table listing the exact values. As you can see, the first three variables—the house value, the number of leftover minutes, and the number of long calls per month—have a higher information gain than the rest." Error: "...the number of leftover minutes.." is not the the number two among the highest information gain. The top three are (in descendent order): 1) House 2) Overage <<< This is the right one instead of LEFTOVER (which is No.4) 3) LONG_CALLS_PER_MONTH Corrected: "The results are in Figure 3-17, with a table listing the exact values. As you can see, the first three variables—the house value, the overcharges per month, and the number of long calls per month—have a higher information gain than the rest." Note from the Author or Editor: Reader is correct. Reader's correction adopted.	Israel Mojica	Dec 02, 2015
Printed	Page 84-86 Figure and equations	pp. 84, Figure 4-3: Decision Boundary should be "Age = Balance x -1.5 + 60". pp.85, Equation 4-1, it should be "1.0 x Age + 1.5 x Balance - 60" for both conditions. pp. 86, the equation should be "f(x) = 60 - 1.0 x Age - 1.5 x Balance". Note from the Author or Editor: Errors 1 and 3 had already been fixed in version 2, but we hadn't caught the second one. Fixed for next version.	Craig	Feb 08, 2014
Printed	Page 86 Equation 4-1. Classification function	The equation reads as follows (in version 2): (plus) if - 1.0 x Age - 1.5 x Balance + 60 > 0 (dot) if - 1.0 x Age - 1.5 x Balance + 60 <= 0 The errata that is already documented and approved removes the minus sign after the if. (plus) if 1.0 x Age - 1.5 x Balance + 60 > 0 (dot) if 1.0 x Age - 1.5 x Balance + 60 <= 0 I believe this is not the correct fix. This makes the formula work, but it doesn't flow with the later formula on the page that reads f(x) = 60 - 1.0 x Age - 1.5 x Balance. Nor does it flow with the original formula on page 85: Age = (-1.5) x Balance + 60 With Algebra you can rewrite the formula as follows: 0 = - Age -1.5 x Balance + 60 I believe the confusion is based around the negative slope. I suggest that you keep the minus sign (-) and flip the comparison operators to read as follows: (plus) if - 1.0 x Age - 1.5 x Balance + 60 < 0 (dot) if - 1.0 x Age - 1.5 x Balance + 60 >= 0 Note from the Author or Editor: Thanks for your close reading. I've adopted your suggested change for v.3, and I'll double check the chapter before the next release to make sure it's all consistent. -Tom	Bill Hoenig	May 13, 2014
ePub	Page 94 3rd Paragraph	The paragrahp states that hinge loss incurs no penalty for an example that is not on the wrong side of the margin and that penalty only becomes positive when an example is on the wrong side of the boundary and beyond the margin. Similar to the errate already entered about the hinge loss image on page 93, this is a same type of mistake. Hinge loss is also incurred for items on the right side of the decision boundary that are still within the margin. Note from the Author or Editor: Thanks. We have remade the diagram completely due to our (repeated) mistakes with it. -Tom	Tom Pauwaert	Feb 18, 2015
, Printed, PDF, ePub, Mobi, , Other Digital Version	Page 107 3rd paragraph	"paramenters" should be "parameters" Note from the Author or Editor: FIxed.	Huguette Barriere	Sep 19, 2013	Dec 19, 2013
Other Digital Version	119 End of penultimate line in page	Found in Google Books version: "...the two separating lines are so similiar..." "Similiar" should be similar. Note from the Author or Editor: Fixed.	Steven Maude	Dec 29, 2013
, Printed, PDF, ePub, Mobi, , Other Digital Version	Page 132 Under Figure 5-11 description	"perforance" should probably be "performance" Note from the Author or Editor: Fixed. Thanks!	Alfonso MHC	Sep 26, 2013	Dec 19, 2013
Printed	Page 137-138 All equations	In all the equations, the "w" subscript on arg max is shown below the word, when it should be a subscript on "max". For comparison, see first paragraph on p. 137 "(The arg max_w just means..." Book version: First Edition, Second Release Note from the Author or Editor: Book inconsistently expressed the arg max function in different places. There is no standardized representation so I chose one and used it. Fixed by author.	Brent Brewington	Nov 28, 2015
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 146 2nd paragraph	On line number 8, on page 146, it says "Whiskeys with Euclidean distance..." but when we check the results in the table below, the distances are calculated using the Jaccard distance and not the euclidean distance. Note from the Author or Editor: Reader is correct -- distance fn had been set to Jaccard. Distance function was changed to Euclidean and table regenerated. Results changed slightly but do not invalidate any points made in the text. New table is: \| Whiskey \| Distance \| \| Bunnahabhain \| 0.00 \| \| Glenglassaugh \| 3.00 \| \| Ardbeg \| 3.16 \| \| Bruichladdich \| 3.16 \| \| Tullibardine \| 3.32 \| \| Aultmore \| 3.46 \|	Emil Vissing	Jan 25, 2016
PDF	Page 169 1st and 2nd paragraphs	Paragraphs 1 and 2 on this page refer to the top and bottom of figure 6.9, while that figure has only 1 part and that is related to what is referred to as the bottom of figure 6.9 in the 2nd paragraph of page 169. Note from the Author or Editor: FIXED; diagram now only shows close-up, and text has been altered to reflect this.	Payam Bagheri	Nov 03, 2017
PDF	Page 170 1st paragraph	The end of the 1st paragrah ("The most unusual tasting single malt in the data appears to be Aultmore, at the very top, which is the last whiskey to join any others.") refers to something that is not shown in figure 6.9 (it actually refers to the top part of 6.9 which has been eliminated from the figure.) Note from the Author or Editor: FIXED; sentence removed.	Payam Bagheri	Nov 03, 2017
PDF	Page 171-172 Figures	Figures 6.10 and 6.11 are in the wrong order with regard to what they illustrate. Note from the Author or Editor: FIXED; will proofread after next draft.	Payam Bagheri	Nov 06, 2017
Printed	Page 177 bottom	'Centroid 7' should probably read 'Cluster 7' to be in line with the other headers called 'Cluster 1' to 'Cluster 6'? Note from the Author or Editor: Fixed. Thanks!	Thomas Krawinkel	May 23, 2015
Printed	Page 181 Explanatory text of Figure 6-14	The last sentence says: "So, the leftmost leaf corresponds to the segment of the population with around body and sherry nose, and the whiskeys in this segment are mostly from cluster J." Instead, the sentence should apply to the RIGHTMOST leaf, as the tree is displayed in the figure, I believe. Note from the Author or Editor: Fixed.	Roland Acra	Oct 04, 2013	Dec 19, 2013
Printed	Page 182 End of first paragraph on the page	In the second to last sentence of the top paragraph on the page, the use of the words "intergroup" and "intragroup" has been inverted. The text says: "To put it another way: characteristic descriptions concentrate on intergroup commonalities, whereas differential descriptions concentrate on intragroup differences." It should be the opposite as the prefix "intra" refers to someting "within" a set, whereas the prefix "inter" refers to something "among" sets. (For instance, international relations are relations among nations, not within each nation...) Note from the Author or Editor: Fixed.	Roland Acra	Oct 04, 2013	Dec 19, 2013
Printed	Page 183 top	I have two questions refering to the decision tree and the written rules: 1. The logic appears to be unnecessarily complicated. After the first decision branch (BODY in NOT round) the only way to get to 'J' is when COLOR = 'full gold' (3rd decision). So why bother with the 2nd decision that splits-off 'red' to the 'not_J' class? 2. The 'code' for the logic and the translation into English don't match IMHO. It appears that the 'code' part misses information. For example in the first rule it say '(NOSE=sherry = 1)'. The '=1' part indicates that we are looking for the term to be TRUE. But the first part of that same rule just reads '(BODY=round)' instead of '(BODY=round = 1)' even though we also want this condition to be true. In the 'code' for the second rule set the '=1' or '=0' parts indicating if we are looking for a TRUE or FALSE are missing completely. The way it is printed all individual conditions appear to be looking for TRUE, but in the tree we can see that sometimes TRUE and sometimes FALSE is required to get to the class 'J' leaf. Please provide feedback to both my questions whether I misunderstood something (and the book is correct as it is) or what the corrected version is. Note from the Author or Editor: (1) As we said, "It is important to note that these category values are not mutually exclusive (e.g., Aberlour’s palate is described as medium, full, soft, round and smooth). In general, any of the values can co-occur (though some of them, like Color being both light and smoky, never do) but because they can co-occur, each value of each variable was coded as a separate feature by Lapointe and Legendre." So COLOR=f.gold and COLOR=red do not exclude each other, and the logic isn't redundant. (2) I think you may have been looking at v.1 of the book, in which the tree had been badly mangled. Still, as you point out, there are errors in the rule translation. The two rules are simply: . `(BODY=round) AND (NOSE=sherry)` => `J` . `(BODY!=round) AND (COLOR!=red) AND (COLOR=full_gold) AND (BODY=light) AND (FINISH=dry)` => `J`	Thomas Krawinkel	May 23, 2015
, Printed, PDF, ePub, Mobi, , Other Digital Version	Page 191 tables 7.2 and 7.3	Given that model A correctly identifies 30% of the negative examples and model B correctly identifies 30% of the positive examples, they achieve 65% accuracy not 80%. Indeed, the two tables should be: (7.2) [500 350] (7.3) [150 0] [ 0 150] [350 500] Note from the Author or Editor: Fixed.	Jean-Pierre Haeberly	Sep 02, 2013	Dec 19, 2013
PDF, ePub	Page 204 sidebar top of page	The sidebar has the text: Sensitivity = TN / (TN + FP) = True negative rate = 1 - False positive rate Specificity = TP / (TP + FN) = True positive rate Sensitivity and specificity here are reversed. Lots of potential references, but Wikipedia has it right: http://en.wikipedia.org/wiki/Sensitivity_and_specificity Note from the Author or Editor: Fixed.	Keith Woeltje	Dec 23, 2013
Printed	Page 230 1st paragraph	On page 230, before last sentence says : "The crossover between Tree and LR occurs at the same place on both graphs, however: at about 25% of the population." This should be replaced by : "The crossover between Tree and NB occurs at the same place on both graphs, however: at about 25% of the population." Note from the Author or Editor: Fixed by author (for 2nd edition)	Ludovic Theate	Dec 26, 2019
PDF, ePub	Page 242 middle of page; 3rd para	After "Our independence assumption allows us to rewrite this as:" we have two equations for p(E). Second equation is just a reformat of the first, and should be removed.	Tom Fawcett	Aug 09, 2013	Dec 19, 2013
PDF, ePub, Mobi	Page 242 2nd paragraph, last sentece	I think "In this case, E is the same for all" should be "In this case, p(E) is the same for all" Note from the Author or Editor: You are correct, it should be p(E). Fixed by author. -Tom	Janos Kovacs	Jan 26, 2014
Printed	Page 246 2nd to last paragraph	Table 9.1 and "...Then using Equation 9-4, my estimated probability would increase by 30% to 0.14 × 1.3 = 18%. If I have three Likes—Sheldon Cooper, Star Trek, and Lord of the Rings—then my estimated probability of High-IQ increases to 0.14 × 1.3 × 1.39 × 1.69 = 43%." The lift values are wrong here. Try adding more values and multiply through, it will go over 1, which is impossible. In fact it will go to about 2426! Those can't be the lift values, only a percentage increases for single attributes or something. All the lift values when multiplied together should come to around 7.14 (7.14 x .014 = 1). Instead they come out to ~17330.16!!! Wally Wang wasn't really that much help on the math after all. Note from the Author or Editor: Fixed in 2nd release, 12/19/2013	Raymond Martin	Feb 16, 2015	Dec 19, 2013
Printed	Page 276 Sidebar	The publication by Mackskassy et al. (2001) is not included in the bibliography section. The same applies to the publications by Mao et al. (2011) and Fawcett & Provost (1999). Note from the Author or Editor: All three fixed by author.	Anonymous	Sep 30, 2016
Other Digital Version	285 3rd paragraph	"Furthmore" should probably be Furthermore. Note from the Author or Editor: Fixed.	Steven Maude	Jan 06, 2014
Other Digital Version	302 Figure 12-4 caption	Caption states "along two these two dimensions". Note from the Author or Editor: Fixed.	Steven Maude	Jan 06, 2014
Other Digital Version	318 2nd paragraph, 6th line	"stragegy" should be strategy. Note from the Author or Editor: Fixed.	Steven Maude	Jan 06, 2014