Errata

Practical Statistics for Data Scientists

Errata for Practical Statistics for Data Scientists

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page &(
Variance formula

The variance formula should include i and n.

Then it is consistent with the previous formulas.

AlainB  Feb 09, 2020 
Printed Page 1-
code samples

This isn't exactly an erratum, but I think it will trip up anyone who is trying to use the downloaded code samples. The 'ascii' package that is referred to in the following script files (chapter1.r, chapter4.r, chapter7.r, prep_datasets.r) is no longer supported with current versions of R. Another package (huxtable?) should be used instead.

Ben Chapman  Jul 25, 2019 
Printed Page 10
second display formula

in the definition of the weighted mean the lower limit in the sum in the denominator is incomplete

Anonymous  Jan 31, 2019 
Printed Page 14
definition of median absolute deviation from the median

"The median of the absolute value of the deviations from the median":
"absolute value" should be replaced by "absolute values"

Anonymous  Jan 31, 2019 
Printed Page 30
last display formula near to bottom

N in definition of Pearson correlation coefficient should be n, compare to text directly afterwards ("Note that we divide by n-1 ...")

Anonymous  Jan 31, 2019 
Printed Page 31-32
4th paragraph page 31, Figure 1-6

The paragraphs on page 31 that reference Figure 1-6 all document the correlation between the daily returns for major exchange traded funds (ETFs). Figure 1-6 contains none of those ETF symbols (SPY, DIA, QQQ, ...). Figure 1-6 contains stock price correlations (AGN, STJ, CERN, ...).

Anonymous  Nov 26, 2018 
Printed Page 31-32
bottom half on p.31 and figure 1-6 on p. 32

the abbreviations in the text don't match the abbreviations in the figure; also it's not clear how positive and negative correlations are marked since an ellipse pointed right is the same as an ellipse pointed left (one end in each direction); maybe it should mean pointed to the upper right corner and pointed to the lower right corner but one can't tell as the ellipses are fairly small in the figure

Anonymous  Jan 31, 2019 
Printed Page 51
3rd paragraph

"... selection of time intervals that accentuate a partiular [!] statistical effect ...":
partiular -> particular

Anonymous  Feb 01, 2019 
Printed Page 71
top

the formula uses sqrt instead of a square root symbol

Anonymous  Feb 01, 2019 
Printed Page 72
top half, in further reading

"The original Gosset paper in Biometrica in 1908 [....]":
the journal's name is Biometrika, not Biometrica

Anonymous  Feb 01, 2019 
Printed Page 73
third paragraph, below figure

"There is a family of binomial distributions, depending on the values of x, n, and p.":
I don't see how the distribution depends on x, it only depends on n and p

Anonymous  Feb 01, 2019 
Printed Page 115
Figure 3-7

The plot in Figure 3-7 shows the chi-square distributions for varying degrees of freedom, but the legend and the plot do not match. The figure in the book appears to be generated by the R code available on GitHub which plots distributions with degrees of freedom 1, 2, 5, and *20*. The legend says 1, 2, 5, and *10*.

Brian Loe  Apr 21, 2020 
Printed Page 139
near to bottom of page

"In the 1970s, Hirotugu Akaike [...] deveoped a metric called ...":
deveoped -> developed

Anonymous  Feb 07, 2019 
Printed Page 204
5

In paragraph
"An example will make this more explicit. For the model fit in “Logistic Regression and the GLM”, the regression coefficient for purpose_small_business is 1.21226. This means that a loan to a small business compared to a loan to pay off credit card debt reduces the odds of defaulting versus being paid off by exp(1.21226) = 3.4. Clearly, loans for the purpose of creating or expanding a small business are considerably riskier than other types of loans."

Did you mean "This means that a loan to a small business compared to a loan to pay off credit card debt increases the odds of defaulting versus being paid off by exp(1.21226) = 3.4."

As far as I understood (based on the text and coefficient), small business increases the odds of defaulting versus being paid off.

Yerkebulan Kambarov  Oct 13, 2019 
Printed Page 258
First full paragraph

"The is referred to as the within-cluster sum of squares or within-cluster SS."

should be:

"This is referred to as..."

Mike Levine  May 08, 2018 
Printed Page 271
Last paragraph

The formula for the m-th predictor's weight should be $\alpha_{m}=\frac{1}{2}\log(\frac{1-e_{m}}{e_{m}})$. In other words, the logarithm should not be in the nominator of the fraction, but sholud be outside the fraction.
Ref: https://en.wikipedia.org/wiki/AdaBoost#Example_algorithm_(Discrete_AdaBoost)

Nai-Chia Chen  May 18, 2022 
Printed Page 274
Tittle of the Section - "Selecting the number of Clusters"

"It does this by choosing the number of clusters for which the Bayesian Information Criteria (BIC) has the largest value."

I think it should be other way round meaning "for which the Bayesian Information Criteria (BIC) has the lowest value"

Shovon Sengupta  Mar 03, 2019 
Printed Page 281
First paragraph

The first step to compute the Gower's distance says "for all pairs of variables i and j for each record" and I think that it must say "for all pairs of RECORDS i and j for each VARIABLE".

Hugo López-Fernández  Oct 06, 2020 
Other Digital Version 3420/7619
1st paragraph under "Influential Values"

Location 3420/7619 in the Kindle Version.

In the first sentence under the heading "Influential Values", influential is misspelled as "infuential" (missing the "l" at position 4).

Josh Fleming  Sep 16, 2019 
Other Digital Version 4629/7619
Just under the code block

Location 4629/7619 in the Kindle version.

The code has this:
[[code]]

knn_pred == 'paid off'
[ 1 ] TRUE

[[/code]]

And then the text says this:

"The KNN prediction is for the loan to default."

These seem to contradict each other. If it's true that the knn prediction is "paid off", then the KNN prediction for the loan should be "paid off", not "default".

Josh Fleming  Sep 16, 2019 
Other Digital Version 6095/7619
Under "Gower's distance" in the "Key Terms" box

Location 6095/7619 in the Kindle Version.

Under "Gower's distance" in the "Key Terms" box, categorical is misspelled as "categoprical" (errant "p" at position 6).

Josh Fleming  Sep 16, 2019