Errata

Errata for Practical Statistics for Data Scientists

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
PDF	Page 2 1st paragraph 1st line	typo: availablility --> availability Note from the Author or Editor: erratum is correct p. 2 availablility --> availability	Joon-Yong Lee	Jul 29, 2017	May 11, 2018
PDF	Page 6-8 examples, code etc.	Less of an erratum; more of a suggestion. I had some issues with your script for downloading from googledrive. It was easy enough to repair, nevertheless I'd suggest that you look into the R googledrive package, or at least consider writing something a little less brittle. At the moment I'm working my way through the 'sample' copy. I haven't yet decided to purchase the book. Regardless, Thank you, Kevin Casey ps. Also, the typesetting needs some repair too. Some of your equations are improperly formatted. Note from the Author or Editor: I updated the download_data.r script to use googledrive package; since this requires installation/update of quite a few packages, I left my previous version as well wrapped in an if(FALSE){} clause.	Kevin Casey	Mar 04, 2018	May 11, 2018
Printed	Page 15, 18 15 formula, 18 near bottom of page	On each of these two pages, MAD has been written as Mean Absolution Deviation. In other places, the A is referenced as 'absolute'. The word absolution is not really a possibility here is it?, having checked its definition. Note from the Author or Editor: erratum is correct page 15 formula, and again page 18 near bottom of page "absolution" should be replaced with "absolute"	Tom Robey	Jul 24, 2017	May 11, 2018
Printed, PDF, ePub	Page 16 2nd paragraph	The last sentence of the second paragraph reads "However, if you divide by n - 1 instead of n, the standard deviation becomes an unbiased estimate." Dividing by n - 1 instead of n produces an unbiased estimate of the variance, but the estimate of the standard deviation is still biased. See https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation. Note from the Author or Editor: in box on p. 16, end of second para: EXISTING: "... the standard deviation becomes an unbiased estimate." CHANGE TO: "... the variance becomes an unbiased estimate."	David W. Body	Mar 09, 2018	May 11, 2018
PDF	Page 27 Top of page	Table 1-3 is a repetition of Table 1-2 Note from the Author or Editor: Correct, this is for reader convenience since the earlier presentation is 6 pages away. Can reword the introductory sentence above it as follows: "Table 1-3 (repeated from Table 1-2, earlier, for convenience) ..."	Anonymous	Apr 21, 2017	Jun 23, 2017
Printed	Page 40 last paragraph	The sentence "Now the picture is much clearer: tax-assessed value is much higher in some zip codes (98112, 98105) than in others (98108, 98057)." References two zip codes that aren't in Figure 1-12 (98112 and 98057.) Either it should read "Now the picture is much clearer: tax-assessed value is much higher in some zip codes (98105, 98126) than in others (98108, 98188)." or the plot titles in the figure are incorrect. Note from the Author or Editor: The correction is right - the sentence at the bottom of p. 40 should read "Now the picture is much clearer: tax-assessed value is much higher in some zip codes (98105, 98126) than in others (98108, 98188)."	Anonymous	Apr 02, 2018	May 11, 2018
PDF	Page 41 last paragraph	This idea has propogated to various modern graphics systems --> This idea has propagated to various modern graphics systems Note from the Author or Editor: p. 41, para at bottom: 1. 3rd line in para, change propogated to propagated ALSO 2. The items in brackets and parentheses are meant to be index citations, and should indexed and should not appear in the text. These items are [Trellis-Graphics] ([lattice]) ([seaborne]) ([bokeh])	JOON-YONG LEE	Jan 02, 2018	May 11, 2018
PDF	Page 45 3rd paragraph	The name was wrong. Al Landon --> Alf Landon Note from the Author or Editor: erratum is correct. p. 45, 3rd para change "Al Landon" to "Alf Landon"	JOON-YONG LEE	Jan 12, 2018	May 11, 2018
PDF	Page 62 1st para	A mistake on this equation: [(1 – [x/100]) / 2]% --> [(100 – x) / 2]% Note from the Author or Editor: erratum is correct. P. 62 top line: EXISTING ... trim [(1-[x/100)/2]% ... CHANGE TO ... trim [(100-x)/2]%...	JOON-YONG LEE	Jan 19, 2018	May 11, 2018
PDF	Page 65 1st para	typo: prodigous --> prodigious Note from the Author or Editor: make change prodigous --> prodigious	JOON-YONG LEE	Jan 17, 2018	May 11, 2018
PDF	Page 67 last paragraph	typo: anamolous --> anomalous Note from the Author or Editor: erratum is correct - p. 67 last para - anamolous --> anomalous	JOON-YONG LEE	Jan 17, 2018	May 11, 2018
PDF	Page 71 first equation	the last term in the formula should be s/sqrt(n) and not s/n Note from the Author or Editor: erratum is correct p. 71, formula at the top the last term in the formula should be s/sqrt(n) and not s/n	Anonymous	Jun 29, 2017	May 11, 2018
PDF	Page 76 2nd para	where the mean number of events per time period is 2 --> where the mean number of events per time period is 0.2. Note from the Author or Editor: p. 76, first para: erratum is correct EXISTING where the mean number of events per time period is 2 CHANGE TO where the mean number of events per time period is 0.2.	JOON-YONG LEE	Jan 21, 2018	May 11, 2018
Printed	Page 87 Bottom of first major paragraph	Text says, "This means that extreme chance results in only one direction direction count toward the p-value." 'Direction' is written twice in a row. Note from the Author or Editor: erratum is correct p. 87, first para after the header, next to last and last line - eliminate one of the "direction"	Tom Robey	Jul 24, 2017	May 11, 2018
Printed	Page 93 For Further Reading section	Bruce's Introductory Statistics and Analytics book is listed with a 2015 date. Pages 88 and 101, the book is listed as 2014. Note from the Author or Editor: erratum is correct p. 93 towards bottom, second item in Further Reading date on Bruce book should be 2014	Tom Robey	Jul 24, 2017	May 11, 2018
Printed	Page 98 Bottom of Data Science and P-Values paragraph	Sentence reads, " - a feature night be included or ... ". I am thinking the word should have been might. Note from the Author or Editor: erratum is correct p. 98, next to last line above box, change "night" to "might"	Tom Robey	Jul 26, 2017	May 11, 2018
PDF	Page 104 Bottom line	The alternative hypothesis uses B > A instead of B < A (or the null hypothesis needs to be changed). Note from the Author or Editor: erratum is correct p. 104 printed edition: last line, should be "A > B" instead of "B > A"	Anonymous	Jul 25, 2017	May 11, 2018
Printed	Page 111 Last sentence.	First release (2017-05-09) of first print edition (May 2017) has Greek letter xi (Unicode 03BE) where Greek letter chi (Unicode 03C7) is meant. Same goes for second formula on page 113. Note from the Author or Editor: erratum is correct p. 111, last line and p. 113, formula in center of page replace Greek letter xi (Unicode 03BE) with Greek letter chi (Unicode 03C7)	Stephen Frost	Jul 11, 2017	May 11, 2018
Printed	Page 111-114 Throughout	The text is inconsistent in its use of "chi-square" vs. "chi-squared". The main section is titled "Chi-Square Test", however page 113 references "the chi-squared statistic" twice, page 114 contains a section titled "Chi-Squared Test: Statistical Theory" (but mentions "chi-square distribution"), and the output given by R states "Pearson's Chi-squared test". Note from the Author or Editor: erratum is correct. p. 113 sentence in the middle of the page and again in the line beginning "where r and c..." -- change chi-squared to chi-square p. 114 in the header, and in the line following the header change "...distribution of the chi-squared statistic..." to "...distribution of the chi-square statistic..." DO NOT change anything in the R output	Matt Galisa	Aug 14, 2017	May 11, 2018
PDF	Page 112 2nd paragraph	Instead of "same result by random chance" - shouldn't it say same result or more extreme - or something like that? Note from the Author or Editor: Confirmed, but on page 96, not 112, 2nd para in the "P-Value" subsection of chapter 3, please replace "...achieve the same result by random chance..." with "... achieve a result as extreme as this, or more extreme, by random chance..."	Anonymous	May 02, 2017	Jun 23, 2017
PDF	Page 124 4th and 5th paragraph	(30% instead of 10%) --> (50% instead of 10%): because 50% is used in the following paragraph as an example. (say 165 ones and 9,868 zeros) --> (say 165 ones and 9,835 zeros): its sum should be equal to 10,000. Note from the Author or Editor: p. 124 - erratum is correct: at the end of the para starting "So we can try..." EXISTING (30% instead of 10%) CHANGE TO (50% instead of 10%)	JOON YONG LEE	Feb 07, 2018	May 11, 2018
PDF	Page 129 1st paragraph	interchangable --> interchangeable Note from the Author or Editor: p 129, 1st para: erratum is correct interchangable --> interchangeable	JOON YONG LEE	Feb 09, 2018	May 11, 2018
PDF	Page 136 First equation	Equation for RMSE shows estimate of y_i on LHS Note from the Author or Editor: The left-hand side of the equation should read "RMSE" not yi-hat; fixed in Atlas source	Anonymous	May 29, 2017	Jun 23, 2017
PDF	Page 139 last paragraph	"where p is the number of..." Here, p should be a capital P to keep consistency with P in the above AIC equation. Note from the Author or Editor: erratum is correct: p. 139, 3rd line from bottom: EXISTING: "where p is the number..." CHANGE TO: "where P is the number..." P retains its italics	JOON YONG LEE	Feb 11, 2018	May 11, 2018
Printed	Page 153 1st paragraph	The paragraph notes “adding a bathroom increases the sale price by $7,500” however in the previous code output, Bathrooms is shown as 5.537e+03 or about $5,500. Note from the Author or Editor: p. 153, end of first text para: erratum is correct EXISTING: ... increases the sale price by $7500 CHANGE TO ... increases the sale price by $5,537	Peter Edstrom	Feb 04, 2018	May 11, 2018
Printed	Page 154 Last paragraph	The slope of the main effect SqFtTotLiving shows as 1.176e+02 ($117) in the R output but the paragraph says $177. Thus for a home in the highest ZipGroup the slope is the sum of the main effect plus the interaction SqFtTotLiving:ZipGroup5 ($117 + $230 = $347) - the text shows 177 + 230 = 447 which not only does not match the R output but is also arithmetically incorrect (177 + 230 actually equals 407). Note from the Author or Editor: erratum is correct: p. 154 last para: line 3: ...$177 per square foot... should be ...$118 per square foot... lines 5 and 6: ...or $177 + 230 = $447... should be ...$118 + $230 = $348 per square foot... line 7: ...by a factor of almost 2.7... should be ...by a factor of more than 2.9...	Matt Galisa	Aug 14, 2017	May 11, 2018
PDF	Page 157 last paragraph	statuatory deed --> statutory deed Note from the Author or Editor: erratum is correct: p. 157, fifth line from bottom, statuatory deed --> statutory deed	JOON-YONG LEE	Feb 15, 2018	May 11, 2018
Printed	Page 170 Middle of main paragraph, under Generalized Additive Models	"Polynomial terms may not flexible enough ... " looks like the word 'be' is missing. Note from the Author or Editor: erratum is correct p. 170 EXISTING "Polynomial terms may not flexible enough ... " CHANGE TO "Polynomial terms may not be flexible enough ... "	Tom Robey	Aug 09, 2017	May 11, 2018
Printed	Page 170 Figure 4-12	Figure 4-12, described as representing spline regression, appears identical to Figure 4-10, representing polynomial regression on page 168 Note from the Author or Editor: The figure in 4.12 is wrong...it is currently a repeat of figure 4.10. I will update with the correct figure	Marshall Ehlinger	Feb 10, 2018	May 11, 2018
Printed	Page 196 Figure 5-5	Both rows of the figure are labeled y = 1; the lower row should be labeled y = 0. Note from the Author or Editor: erratum is correct: lower left cell of Figure 5-5 should read y=0	Matt Galisa	Aug 14, 2017	May 11, 2018
PDF	Page 196 Figure 5-5	Shorthand for Specificity labeled as FP/(y=0). It should be Specificity TN/(y=0). Note from the Author or Editor: erratum is correct: p. 196, chart, far right, Specificity should be TN/(y=0)	John Masiello	Sep 14, 2017	May 11, 2018
Printed	Page 197 Bottom of page	The denominator in the equation for specificity is incorrect. ∑FalseNegative should be replaced with ∑FalsePositive. Note from the Author or Editor: erratum is correct p. 197 last formula, second element in denominator should be ∑FalsePositive	Phil Terwilliger	Jan 11, 2018	May 11, 2018
PDF	Page 201 last paragraph	indiscriminantly --> indiscriminately Note from the Author or Editor: p. 201, second to last line, erratum is correct. Fix misspelling: indiscriminantly --> indiscriminately	JOON-YONG LEE	Feb 27, 2018	May 11, 2018
PDF	Page 205 1st paragraph in Data Generation	(see “Undersampling” on page 204) --> (see “Oversampling and Up/Down Weighting” on page 204) Note from the Author or Editor: p. 205, first line under heading "Data Generation," erratum is correct: EXISTING see “Undersampling” on page 204) CHANGE TO (see “Oversampling and Up/Down Weighting” on page 204	JOON-YONG LEE	Feb 28, 2018	May 11, 2018
PDF	Page 208 in further reading	Analytics Vidya --> Analytics Vidhya Note from the Author or Editor: erratum is correct - 3rd item under Further Reading change should read ...Analytics Vidhya...	JOON-YONG LEE	Feb 28, 2018	May 11, 2018
Printed	Page 212 3rd paragraph	describes the paid off symbol as triangle but is actually a cross. states the qty of default (circle) as 14 and paid of (cross) as 6, but in the figure 6.2 it is 9 default and 11 paid off for paragraph 1 the step of making a prediction with dti=22.5 and payment_inc_ratio=9 is not listed, just the outcome Note from the Author or Editor: p. 212, para in the middle: EXISTING: The circles (default) and triangles (paid off) CHANGE TO: The circles (default) and crosses (paid off) ALSO EXISTING: "... 14 defaulted loans lie within the circle as compared with only 6 paid-off loans. Hence the predicted outcome of the loan is default" CHANGE TO: "... 9 defaulted loans lie within the circle as compared with 11 paid-off loans. Hence the predicted outcome of the loan is paid-off"	David Pugh	Mar 01, 2018	May 11, 2018
Printed	Page 221 diagram	The decision tree diagram could do with an explanation of which branch to follow if the node question is true or false. Its not immediately obvious that you go to the left if true and to the right if false Note from the Author or Editor: In middle of p. 221, last line before figure: EXISTING: " ... traversing through a hierarchical tree, starting at the root until a leaf is reached" CHANGE TO: "...traversing through a hierarchical tree, starting at the root and moving left if the node is true and right if not, until a leaf is reached."	David Pugh	Mar 01, 2018	May 11, 2018
PDF	Page 222 last paragraph	righthand region --> lefthand region Note from the Author or Editor: last line on page 222: Change righthand region to lefthand region	JOON-YONG LEE	Mar 06, 2018	May 11, 2018
PDF	Page 223 Figure 6-4	A caption for Figure 6-4 is same to the caption for Figure 6-3. It must be fixed. Note from the Author or Editor: p. 223 caption for figure 6.4 should read "The first three rules for a simple tree model fit to the loan data.	JOON-YONG LEE	Mar 06, 2018	May 11, 2018
PDF	Page 230 top of page	Says "refered to as random forest" instead of "referred" Note from the Author or Editor: erratum is correct p. 230 very first word on page should be "referred" instead of "refered"	Anonymous	Jul 18, 2017	May 11, 2018
PDF	Page 230 1st paragraph in Bagging	n records. --> N records: in Step 1 of the bagging algorithm, n means the size of bootstrap resample. Note from the Author or Editor: Confirmed - In first para under "Bagging" header, end of sentence should read "with N records." instead of "with n records"	JOON-YONG LEE	Apr 02, 2018	May 11, 2018
Printed, PDF	Page 244 last paragraph	acting in a similar mannger --> manner Note from the Author or Editor: change: acting in a similar mannger --> manner	JOON-YONG LEE	Apr 09, 2018	May 11, 2018
PDF	Page 267 last paragraph	The oil stocks (XOM, CVS, SLB, COP) --> The oil stocks (XOM, CVX, SLB, COP) Note from the Author or Editor: erratum is correct - p. 267, next to last line, change CVS to CVX	JOON-YONG LEE	Apr 14, 2018	May 11, 2018
PDF	Page 268 main steps of the agglomerative algorithm	for "D(Ck,Cl))" in step 2 and 3, right parentheses are duplicated. Note from the Author or Editor: Erratum is correct - p. 268, drop the extra right parenthesis in step 2 and step 3	JOON-YONG LEE	Apr 14, 2018	May 11, 2018
PDF	Page 272 1st paragraph in Mixtures of Normals	N1(μ1),Σ1), N1(μ2),Σ2), ..., N1(μK),ΣK) has wrong a dimension and parentheses. --> it should be like this N2(μ1,Σ1), N2(μ2,Σ2), ..., N2(μK,ΣK) Note from the Author or Editor: p. 272 - correction should be made as stated - last line of initial para should read N2(μ1,Σ1), N2(μ2,Σ2), ..., N2(μK,ΣK)	JOON-YONG LEE	Apr 14, 2018	May 11, 2018
PDF	Page 281 last code block	i cannot find the definition of "dnd_cut". we need this dnd_cut <- cut(dnd, h=0.5) Note from the Author or Editor: erratum is correct. Bottom of p. 281, Insert an additional line of code ABOVE the two lines already there, so it reads > dnd_cut <- cut(dnd, h=0.5) > df[labels... etc.	JOON-YONG LEE	Apr 18, 2018	May 11, 2018
Other Digital Version	567 Table 1-5	I am using the Kindle edition this is LOCATION 567 not page number Table 1-5 shows a column States in which all States are correctly allocated to their respective bin or break. The R code does not create this column ... I don't know, yet, how to correct this problem so your advice is awaited! Note from the Author or Editor: There was a bug in the R script that created the state abbreviations. I have uploaded the code with the following fix: state_abb <- state %>% arrange(Population) %>% group_by(PopFreq) %>% summarize(state = paste(Abbreviation, collapse=","), .drop=FALSE) %>% complete(PopFreq, fill=list(state='')) %>% select(state) state_abb <- unlist(state_abb)	Duncan Williamson	Sep 15, 2017	May 11, 2018