Errata

Errata for Practical Statistics for Data Scientists, Second Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
PDF	Page Statistical Significance and p-Values Code example	In this section, we used the function perm_fun() which was defined in "Resampling" Section. The original function calculates the mean difference between samples, but in the example presented here, we need to calculate the difference between proportions. To solve this issue, a proposed solution is to create a new function that only changes this specific line of code: return x.loc[list(idx_B)].mean() - x.loc[list(idx_A)].mean() with this: return x.loc[list(idx_B)].sum()/ nB - x.loc[list(idx_A)].sum()/ nA (The code is in Python)	Anonymous	May 02, 2023
Printed	Page AB testing N/a	I had a question about A/B testing which I hope you can help with. My situation is that I am building a propensity model to identify which of our company's customers are most likely to sign up to a service after being sent an email. The model produces a likelihood score between 0 and 1 for each member and ranks them from 1 to X, where X is the size of our customer base. In practice we would then select the top N from this list to email. So, if our customer base is 1 million members, we would then select say the 100,000 ranked most highly (i.e. that have the highest likelihood score) by the model. I want to compare how good the model is at identifying likely signups compared to the current business rules. However, I do not know how to conduct an A/B test in this scenario. Or indeed whether an A/B test is the most appropriate test here. I understand the usual principle of randomly splitting the population into two groups and applying different treatments to each, such as a webpage layout or a drug treatment. However, in my case, the thing we are testing the efficacy of is the selection method itself (the email we send to each group would be the same). If we were to randomly split the population into two groups, then there will likely be customers who the model has ranked very highly, but which are in the business-rules group. Which seems to me like it wouldn't be fairly testing the model, because we are not giving it the chance to prove itself - we are not letting it have all of its 'top picks'. Do you have any advice on this?	Anonymous	Mar 19, 2024
Printed	Page 53 Sample Mean Versus Population mean	The symbol used to represent the mean of population is missing.	Anonymous	Feb 07, 2022
Printed	Page 53 3rd paragraph	left out symbol for mean of population in …”whereas is used to represent the mean of a population.”	John Taylor	Jul 30, 2022
Printed, PDF	Page 99 4rd paragraph that start with "Page B has session times that are greater than those of page A by 35.67 seconds"	i think in "The question is whether this difference is within the range of what random chance might produce, i.e., is statistically significant", last statement should removed "i.e., is statistically significant"	Mohammed Kamal Alsyd	Jul 21, 2023
Printed	Page 137 3rd paragrapgh	Because the earlier draws presented in both the previous page and in the page in question (137) are suggesting that whatever value (ones) Box A takes, the Box B gets as many zeros as the remainder of the whole, which is 10,000. The whole being 10,000 and the Box B gets the remainder of what Box A got might not have been a strict rule there, but if it was meant to be as such, then from a standpoint of consistency, I suppose in the 3rd paragraph of page 137, with the boosted up new value of 165 (1.65%) ones for Box A, the Box B should also be equal to the remainder of 10,000 which would be 9835 (not 9868).	Emir Bilim	Dec 24, 2021
Printed	Page 189 Figure 4-10	Figure 4-10 — LOWESS target in partial-residual plot In the associated code in the GitHub repository for Figure 4-10, the LOWESS smoother is applied to the component (`results.ypartial`) rather than to the partial residuals. Replace: ```python smoothed = sm.nonparametric.lowess(results.ypartial, results.feature, frac=1/3) ``` with: ```python smoothed = sm.nonparametric.lowess( results.ypartial + results.residual, # PR_i(x) results.feature, frac=1/3 ) ``` Rationale. The gray dashed line is intended to be a LOWESS smooth of the partial residuals to show the empirical relationship without imposing a polynomial. Smoothing the component (the black line) instead does not match the textbook definition of a partial-residual plot.	Nahid Ahmadvand	Sep 20, 2025
PDF	Page 200 Numeric Predictor Variables	Note from the Author or Editor: ------------------ Hello, I looked at this more. In contrast to the R implementation, GaussianNB treats categorical features as numerical. This is not correct. We can see this if we build a model with categorical features only. With the MultinomialNB, we get this prediction: array([[0.65369619, 0.34630381]])) while the GaussianNB results in this prediction: (array([[9.99994372e-01, 5.62841090e-06]]) so essentially [1, 0]. It will be necessary to build two separate models and then combine the predictions. ------------------ My Response: Thanks for taking a closer look. You’re absolutely right that GaussianNB treats all inputs as continuous. If we feed it one-hot encoded categoricals, those 0/1 dummies are modeled as Gaussians with very small within-class variances, which can drive extreme posteriors (the near [1, 0] you observed when using only categoricals). That behavior is consistent with a mismatch between the model family (Gaussian) and the data type (categorical). That said, if most of the signal lies in the continuous features and the goal is ranking by P(Y=1), a GaussianNB-centric approach can work quite well even if the categorical piece is handled imperfectly. As the book notes, Naive Bayes often provides decent ranking but biased probability estimates; calibration (e.g., CalibratedClassifierCV with isotonic or sigmoid) helps if calibrated probabilities matter. Best,	Nahid Ahmadvand	Sep 23, 2025
Printed	Page 278 Inset on Ridge regression and the Lasso	The indices of X should be X_{p,i} (cf. p. 151), but as it currently stands we have X_i and X_p. Shouldn't `i` refer to the example index and `p` refer to the dimension index?	Amine Laghaout	Feb 15, 2023