Errata

Errata for Practical Statistics for Data Scientists

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
Printed	Page &( Variance formula	The variance formula should include i and n. Then it is consistent with the previous formulas.	AlainB	Feb 09, 2020
Printed	Page 1- code samples	This isn't exactly an erratum, but I think it will trip up anyone who is trying to use the downloaded code samples. The 'ascii' package that is referred to in the following script files (chapter1.r, chapter4.r, chapter7.r, prep_datasets.r) is no longer supported with current versions of R. Another package (huxtable?) should be used instead.	Ben Chapman	Jul 25, 2019
Printed	Page 10 second display formula	in the definition of the weighted mean the lower limit in the sum in the denominator is incomplete	Anonymous	Jan 31, 2019
Printed	Page 14 definition of median absolute deviation from the median	"The median of the absolute value of the deviations from the median": "absolute value" should be replaced by "absolute values"	Anonymous	Jan 31, 2019
Printed	Page 30 last display formula near to bottom	N in definition of Pearson correlation coefficient should be n, compare to text directly afterwards ("Note that we divide by n-1 ...")	Anonymous	Jan 31, 2019
Printed	Page 31-32 4th paragraph page 31, Figure 1-6	The paragraphs on page 31 that reference Figure 1-6 all document the correlation between the daily returns for major exchange traded funds (ETFs). Figure 1-6 contains none of those ETF symbols (SPY, DIA, QQQ, ...). Figure 1-6 contains stock price correlations (AGN, STJ, CERN, ...).	Anonymous	Nov 26, 2018
Printed	Page 31-32 bottom half on p.31 and figure 1-6 on p. 32	the abbreviations in the text don't match the abbreviations in the figure; also it's not clear how positive and negative correlations are marked since an ellipse pointed right is the same as an ellipse pointed left (one end in each direction); maybe it should mean pointed to the upper right corner and pointed to the lower right corner but one can't tell as the ellipses are fairly small in the figure	Anonymous	Jan 31, 2019
Printed	Page 51 3rd paragraph	"... selection of time intervals that accentuate a partiular [!] statistical effect ...": partiular -> particular	Anonymous	Feb 01, 2019
Printed	Page 71 top	the formula uses sqrt instead of a square root symbol	Anonymous	Feb 01, 2019
Printed	Page 72 top half, in further reading	"The original Gosset paper in Biometrica in 1908 [....]": the journal's name is Biometrika, not Biometrica	Anonymous	Feb 01, 2019
Printed	Page 73 third paragraph, below figure	"There is a family of binomial distributions, depending on the values of x, n, and p.": I don't see how the distribution depends on x, it only depends on n and p	Anonymous	Feb 01, 2019
Printed	Page 115 Figure 3-7	The plot in Figure 3-7 shows the chi-square distributions for varying degrees of freedom, but the legend and the plot do not match. The figure in the book appears to be generated by the R code available on GitHub which plots distributions with degrees of freedom 1, 2, 5, and 20. The legend says 1, 2, 5, and 10.	Brian Loe	Apr 21, 2020
Printed	Page 139 near to bottom of page	"In the 1970s, Hirotugu Akaike [...] deveoped a metric called ...": deveoped -> developed	Anonymous	Feb 07, 2019
Printed	Page 204 5	In paragraph "An example will make this more explicit. For the model fit in “Logistic Regression and the GLM”, the regression coefficient for purpose_small_business is 1.21226. This means that a loan to a small business compared to a loan to pay off credit card debt reduces the odds of defaulting versus being paid off by exp(1.21226) = 3.4. Clearly, loans for the purpose of creating or expanding a small business are considerably riskier than other types of loans." Did you mean "This means that a loan to a small business compared to a loan to pay off credit card debt increases the odds of defaulting versus being paid off by exp(1.21226) = 3.4." As far as I understood (based on the text and coefficient), small business increases the odds of defaulting versus being paid off.	Yerkebulan Kambarov	Oct 13, 2019
Printed	Page 258 First full paragraph	"The is referred to as the within-cluster sum of squares or within-cluster SS." should be: "This is referred to as..."	Mike Levine	May 08, 2018
Printed	Page 271 Last paragraph	The formula for the m-th predictor's weight should be $\alpha_{m}=\frac{1}{2}\log(\frac{1-e_{m}}{e_{m}})$. In other words, the logarithm should not be in the nominator of the fraction, but sholud be outside the fraction. Ref: https://en.wikipedia.org/wiki/AdaBoost#Example_algorithm_(Discrete_AdaBoost)	Nai-Chia Chen	May 18, 2022
Printed	Page 274 Tittle of the Section - "Selecting the number of Clusters"	"It does this by choosing the number of clusters for which the Bayesian Information Criteria (BIC) has the largest value." I think it should be other way round meaning "for which the Bayesian Information Criteria (BIC) has the lowest value"	Shovon Sengupta	Mar 03, 2019
Printed	Page 281 First paragraph	The first step to compute the Gower's distance says "for all pairs of variables i and j for each record" and I think that it must say "for all pairs of RECORDS i and j for each VARIABLE".	Hugo López-Fernández	Oct 06, 2020
Other Digital Version	3420/7619 1st paragraph under "Influential Values"	Location 3420/7619 in the Kindle Version. In the first sentence under the heading "Influential Values", influential is misspelled as "infuential" (missing the "l" at position 4).	Josh Fleming	Sep 16, 2019
Other Digital Version	4629/7619 Just under the code block	Location 4629/7619 in the Kindle version. The code has this: [[code]] knn_pred == 'paid off' [ 1 ] TRUE [[/code]] And then the text says this: "The KNN prediction is for the loan to default." These seem to contradict each other. If it's true that the knn prediction is "paid off", then the KNN prediction for the loan should be "paid off", not "default".	Josh Fleming	Sep 16, 2019
Other Digital Version	6095/7619 Under "Gower's distance" in the "Key Terms" box	Location 6095/7619 in the Kindle Version. Under "Gower's distance" in the "Key Terms" box, categorical is misspelled as "categoprical" (errant "p" at position 6).	Josh Fleming	Sep 16, 2019