# Errata for Introduction to Machine Learning with Python

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date Submitted Date Corrected
Safari Books Online
safari app (no pages available)
Chapter 2, section on Lasso, just after In[39]:

Decreasing alpha to 0.01, we obtain the solution shown as the green dots Should be "red" dots

Note from the Author or Editor:
"red dots" needs to be replaced with "upward pointing triangle", "shown in teal" should be "shown as circles".

Thierry Herrmann  Oct 08, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 2, section on Naive Bayes Classifiers, subsection Strengths, weaknesses, and parameters, 2nd paragraph

... performs better than BinaryNB shou\ld be "BernoulliNB"

Note from the Author or Editor:
Should be BernoulliNB indeed.

Thierry Herrmann  Oct 08, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
in all notebook cells from In[93] up to In[97] and in the text just above In[94]

for scikit-learn 0.18 as mentioned in chapter 1, replace 'algorithm' parameter with 'solver' and 'l-bfgs' with 'lbfgs' Thanks (sorry for duplicate errata submission, got a proxy error from o'reilly at the 1st attempt)

Note from the Author or Editor:
As described, l-bfgs should always be lbfgs and "algorithm" in the code (or in fixed-width in the text) should always be "solver".

Thierry Herrmann  Oct 10, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 3, section "Applying PCA to the cancer dataset for visualization", just below the graph after In[17]:

"We can also see that the malignant (red) points are more spread out than the benign (blue) points" In the text, 'red' and 'blue' should be swapped to match the graph (or swap the colors in the graph)

Note from the Author or Editor:
Simply remove "red" and "blue" from the text.

Thierry Herrmann  Oct 13, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 3, section "Eigenfaces for feature extraction", just below Out[28]:

"The input space here is 50×37-pixel grayscale images, so directions within this space are also 50×37-pixel grayscale images" Replace 50x37 with 87x65 since people.images.shape: (3023, 87, 65) (was wondering where 5655 was coming from in pca.components_.shape: (100, 5655))

Note from the Author or Editor:
Replace 50x37 with 87x65 everywhere in the text.

Thierry Herrmann  Oct 13, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 4, section on One-Hot-Encoding, code in cell In[2]

data = pd.read_csv( "/home/andy/datasets/adult.data", header=None, index_col=False, names=['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'gender', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'income']) It's unlikely /home/andy/datasets/adult.data will work for people who'll copy/paste the code (also in github as of this writing). The original data set should work: data = pd.read_csv( "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", ...) (or include the dataset in github and use relative path)

Note from the Author or Editor:

Thierry Herrmann  Oct 19, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 4, section on Univariate Nonlinear Transformations, text after Out[33]

"The value 2 seems to be the most common, with 62 appearances ..." should be 68 appearances

Note from the Author or Editor:
62 should be 68.

Thierry Herrmann  Oct 19, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 4, section "Utilizing Expert Knowledge", text below figure 4.16

"The reason for this is that we encoded day of week and time of day using integers, which are interpreted as categorical variables" should be 'continuous' variables. The next sentence, saying that we do need 'categorical' variables in this case, is correct.

Note from the Author or Editor:
"categorical" should be "continuous"

Thierry Herrmann  Oct 19, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 5, section "Using Pipelines in Grid Searches", "Illustrating Information Leakage"

Very minor typo: the text mentions: "regression task with 100 samples and 1,000 features" but the code uses 10000 features: "X = rnd.normal(size=(100, 10000))"

Note from the Author or Editor:
Text should say 10000 features

Thierry Herrmann  Oct 23, 2016  Jan 13, 2017
Safari Books Online
safari app (no pages available)
Chapter 7, section "Topic Modeling and Document Clustering", text just above In[41]:

Text says: "We’ll remove words that appear in at least 20 percent of the documents, and we’ll limit the bag-of-words model to the 10,000 words that are most common after removing the top 20 percent" but the code uses max_df=.15

Note from the Author or Editor:
At the bottom of page 348, before In[41]: "20 percent" in the text should be replaced by "15 percent" for both occurences.

Thierry Herrmann  Oct 25, 2016  Jan 13, 2017
Printed
last paragraph

(1st edition) below Out[34], "kernel parameter is always set to 'rbf' (not that the entry for kernel is a list of length on)" It seems missplled "note", not "not"

Haesun Park  Feb 25, 2017  Jun 09, 2017
Safari Books Online
Chapter 2
Predicting Probabilities

"We’ve reproduced this in Figure 2-57, and we encourage youto go though the example there." "through" instead of "though"

Mirwaisse DJANBAZ  Oct 22, 2017  Oct 19, 2018
Safari Books Online
Chapter 3
APPLYING PCA TO THE CANCER DATASET FOR VISUALIZATION

"Each plot overlays two histograms, one for all of the points in the benign class (blue) and one for all the points in the malignant class (red)." Blue --> green Red --> blue

Note from the Author or Editor:
The colors should be removed given the b&w print. The legend should be sufficient explanation. Please remove the parenthesis.

Mirwaisse DJANBAZ  Oct 23, 2017  Oct 19, 2018
Mobi
Page vii
last paragraph

The link to "The Elements of Statistical Learning" under the text "the authors’ website." is incorrect. The correct link is https://web.stanford.edu/~hastie/pub.htm

Note from the Author or Editor:
It should be http://web.stanford.edu/~hastie/ElemStatLearn/

Gabor Szabo  Nov 27, 2017  Oct 19, 2018
ePub
Below figure 2-27

„Following the branches to the right, we see that worst radius <= 16.795 creates a node that contains only 8 benign but 134 malignant samples“ Should be > „Taking a left at the root, for worst radius > 16.795 we end up with 25 malignan“ Should be <=

Note from the Author or Editor:
Indeed, left is "true" right is "false" so <= and > should be exchanged.

Mile Dragosavac   Dec 01, 2017  Oct 19, 2018
ePub
Below figure 2-29

„meaning we cannot say “a high value of X[0] means class 0, and a low value means class 1” (or vice versa).“ Starting from roots perspective and taking into account X[0] is not relevant for splitting the data, it should be: „meaning we cannot say a high value of X[1] means class 1, and a low value means class 0” (or vice versa).“

Note from the Author or Editor:
Indeed should be X[1] instead of X[0]

Mile Dragosavac  Dec 01, 2017  Oct 19, 2018
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 11
bottom of page

Earlier versions of the book were missing "from IPython import display" in the import statements in the note at the bottom of page 11 (top of page 12 in newer versions).

Andreas C Müller

Apr 25, 2017  Jun 09, 2017
PDF
Page 12
1st paragraph under the figure

The text says: With gamma=0.05, performance drastically improves to an AUC of 0.5. In the output, the value is 0.9. As it should be, otherwise the explanation wouldn't make sense :).

Note from the Author or Editor:
Page 298 first paragraph should be " With gamma=0.05, performance drastically improves to an AUC of 0.9."

Joaquin Vanschoren  Feb 15, 2017  Jun 09, 2017
PDF
Page 14
fourth paragraph

"is the foundation upon which machine learning is BUILD" should be "is the foundation upon which machine learning is BUILT"

A Aziz  Apr 27, 2017  Jun 09, 2017
PDF
Page 16
Jupyter Notebook

In line 5 of 1st paragraph under the topic Jupyter Notebook: The "Jypyter" Notebook makes it easy to incorporate......

Manpreet Singh  Sep 29, 2016  Sep 22, 2016
Printed
Page 16
In[17] and Out[17]

"First five columns" should be "First five raws"

Note from the Author or Editor:
Should be "First five rows".

HIDEMOTO NAKADA  Feb 10, 2017  Jun 09, 2017
Printed
Page 20
Code Block

If using newer version of Pandas (ie. 0.24.1) the scatter_matrix method is actually inside the package plotting and needs to be called like this: pd.plotting.scatter_matrix(...)

Note from the Author or Editor:
Which print of the book are you using? This has been corrected in more recent prints.

Cristian Varela  Feb 09, 2019
PDF
Page 34
4th paragraph

The book references "91 possible combinations of two features within those 13" and further clarifies in the foot note to use "13 choose 2" . 13 choose 2 is 78, 14 choose 2 is 91.

Note from the Author or Editor:
The main text should be "91 possible combinations of two features within those 13 (with replacement)" The footnote should say "This is 13 interactions for the first feature, plus 12 for the second not involving the first, plus 11 for the third on so on. 13 + 12 + 11 + ... + 1 = 91"

Mike Hancock  Oct 18, 2016  Jan 13, 2017
PDF
Page 40
Paragraph below figure

In "In other words, using few neighbors corresponds to high model com‐ plexity (as shown on the right side of Figure 2-1), and using many neighbors corre‐ sponds to low model complexity (as shown on the left side of Figure 2-1)" left and right should be reversed.

Andreas Mueller  Jan 18, 2017  Jun 09, 2017
PDF, Safari Books Online
Page 45
1st paragram of section "Linear models for regression"

"For regression, the general prediction formula for a linear model looks as follows: ŷ = w[0] * x[0] + w[1] * x[1] + ... + w[p] * x[p] + b Here, x[0] to x[p] denotes the features (in this example, the number of features is p)..." There are p+1 features in total.

Note from the Author or Editor:
Should be "the number of features is p+1"

Anonymous  Nov 11, 2016  Jan 13, 2017
Printed
Page 47
The first paragraph of "Linear regression (aka ordinary least squares)"

"The mean squared error is the sum of the squared differences between the predictions and the true values." "mean squared error" is not the sum, but the average of the two values.

Note from the Author or Editor:
Should be "The mean squared error is the sum of the squared differences between the predictions and the true values, divided by the number of samples."

HIDEMOTO NAKADA  Feb 11, 2017  Jun 09, 2017
Printed
Page 48,53,54

1st edition, 1st release p48, above In[29], "506 samples and 105 derived features." --> "506 samples and 104 derived features." p 53, under Out[36], "only 4 of the 105 features." --> "only 4 of the 104 features." p54, under Out[37], "using only 33 of the 105 features." --> "using only 33 of the 104 features." because load_extended_boston() does not have bias term.

Haesun Park  Jul 17, 2017  Oct 19, 2018
Printed
Page 49
footnote

(1st edition) In footnote about L2 regularization, "Ridge penalizes the L2 norm of coefficients" should be "Ridge penalizes the squared L2 norm of coefficients"

Haesun Park  Apr 28, 2017  Jun 09, 2017
PDF
Page 52
paragraph starts with 'Here, alpha=0.1 '

"Here, alpha=0.1 seems to be working well. We could try decreasing alpha even more to improve generalization. " 'decreasing' here should be 'increasing' since with larger alpha, we will have stronger regularization and hence better generalization.

Note from the Author or Editor:
"decreasing" is correct but the sentence is slightly misleading and should be rephrased, to "We could try decreasing alpha even more to improve test-set score."

PDF, Safari Books Online
Page 55
1st paragraph (beneath the plot)

"Using alpha=0.00001, we get a model that is quite unregularized,..." Should be: "...alpha=0.0001..."

Anonymous  Nov 17, 2016  Jan 13, 2017
Printed
Page 58
3rd paragraph

"Most of the points in class 0 are at the top, and most of the points in class 1 are ath the bottom" should be "Most of the points in class 0 are at the bottom, , and most of the points in class 1 are ath the top"

Note from the Author or Editor:
bottom and top should be exchanged.

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 58
Above Fig. 2-16

(1st edition) In last setence above Fig. 2-16 "Here is an illustration using SVC" is better than "Here is an illustration using LinearSVC", because plot_linear_svc_regularization() use SVC.

Note from the Author or Editor:
I changed the code to use LinearSVC which makes more sense at this point in the book.

Haesun Park  Apr 28, 2017  Jun 09, 2017
Printed
Page 59
2nd paragraph

"Let's analyze LinearLogistic in more detail .." should be "Let's analyze LogisticRegression in more detail .."

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed, PDF
Page 60
Code in listing In[45]:

Value of C in label should be 0.01 i.e. label="C=0.01" instead of label="C=0.001" in the below line: plt.plot(logreg001.coef_.T, 'v', label="C=0.001")

Note from the Author or Editor:
Thanks! Good catch.

Anonymous  Feb 05, 2019
Printed
Page 63
Figure 2-17

the X-axis label should be 'Feature', instead of 'Coefficient index'. The same for Figure 2-18. The codes that generate these figures also need fix.

HIDEMOTO NAKADA  Mar 31, 2017  Jun 09, 2017
Printed
Page 72
1st paragraph

"Splitting the dataset vertically at x[1]=0.0596 yields the most information; it best separates the points in class 1 from the points in class 2." should be "Splitting the dataset horizontally at x[1]=0.0596 yields the most information; it best separates the points in class 0 from the points in class 1."

Haesun Park  Dec 18, 2016  Jan 13, 2017
ePub
Page 72

„The top node, also called the root, represents the whole dataset, consisting of 75 points belonging to class 0 and 75 points belonging to class 1“ Should be 50 points to each class. Dataset consists of 100 points. The right part of figure 2-24 shows the root having 50 points in each class.

Note from the Author or Editor:
75->50

Mile Dragosavac   Dec 01, 2017  Oct 19, 2018
Printed
Page 77
1st paragraph, 2nd paragraph

1st paragraph: "The n_samples shown in each node in Figure 2-27 gives the number of samples in that node,.." should be "The samples shown in each node in Figure 2-27 gives the number of samples in that node,.." 2st paragraph: "Nearly all of the benign samples end up in the second leaf from the right,.." should be "Nearly all of the benign samples end up in the second leaf from the left,.."

Note from the Author or Editor:
Remove "n_" before samples.

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 77
Above "Feature importance in trees" section

(1st edition) In 1st, 2nd paragraphs, I think that <=, > signs are flipped. "worst radius <= 16.795 creates a node that contains only 8 benign but 134 malignant samples. ... for worst radius > 16.795 we end up with 25 malignant and 259 benign samples" should be "worst radius > 16.795 creates a node that contains only 8 benign but 134 malignant samples. ... for worst radius <= 16.795 we end up with 25 malignant and 259 benign samples"

Note from the Author or Editor:
Confirmed also in the newest print.

Haesun Park  Apr 28, 2017  Jun 09, 2017
Printed
Page 78
2nd paragraph

"However, if a feature has a low feature_importance,.." should be "However, if a feature has a low feature_importance_,.."

Note from the Author or Editor:
It should say "has a low value in feature_importance_" with a trailing underscore.

Haesun Park  Dec 18, 2016  Jan 13, 2017
ePub
Page 78.9
Near Figure 2-5

In explaining Figure 2-5, the authors switch from describing the new data points as stars to crosses. It is very confusing. I think the authors meant to say that the new data points are stars. The authors say that but then go on to mention crosses in the figure.

Note from the Author or Editor:
should be stars everywhere indeed.

Anonymous  Dec 19, 2018
Printed
Page 80
line 1

"a high value of X[0] means class 0, and a low value means class 1" X[0] should be X[1].

Note from the Author or Editor:
Page 82 line 1, "X[0]" should be "X[1]"

Hidemoto Nakada  Feb 01, 2017  Jun 09, 2017
Printed
Page 88
5st paragraph

"max_features=sqrt(n_features) for classification and max_features=log2(n_features) for regression" In RandomForestRegressor, max_features default is n_features not log2(n_features)

Note from the Author or Editor:
It should say "max_features=n_features for regression"

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 92
4th paragraph

"You can find the details in Chapter 1 of Hastie, Tibshirani, and Friedman's The Elements of Statistical Learning" should be "You can find the details in Chapter 12 of Hastie, Tibshirani, and Friedman's The Elements of Statistical Learning"

Note from the Author or Editor:
It should be "Chapter 12"

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 94
In[78]

first comment: "# add the squared first feature" should be "# add the squared second feature"

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 95
last sentence

ax.set_zlabel("feature0 ** 2") shold be ax.set_zlabel("feature1 ** 2")

Note from the Author or Editor:
I think that is a duplicate, but I'm not sure if this location was reported before.

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 95
last line

The last line of In[79] should be: ax.set_zlabel("feature1 ** 2") instead of: ax.set_zlabel("feature0 ** 2")

Jess D  Dec 30, 2016  Jan 13, 2017
Printed
Page 98
equation in the middle

k_rbf(x_1, x_2) = exp(\gamma||x_1 - x_2||^2) should be k_rbf(x_1, x_2) = exp(-\gamma||x_1 - x_2||^2)

Note from the Author or Editor:
missing minus sign

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 100
2nd paragraph

"Increasing C, as shown on the bottom right, allows these points to have a stronger influence on the model and makes the decision boundary bend to correctly classify them." I think "Increasing C, as shown on the bottom left, ..." is better.

Note from the Author or Editor:
"bottom right" should be replaced by "bottom left"

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 102
In Preprocessing for data for SVMs

The book disagrees with the sklearn website on how to scale for SVMs. It should be explained more clearly that the right choice of scaling depends on data and model, and StandardScaler would also be a valid approach.

Andreas C Müller

Oct 30, 2017  Oct 19, 2018
Printed
Page 107
equation in the middle

h[0] = tanh(w[0,0]*x[0] + w[1,0]*x[1] + w[2,0]*x[2] + w[3,0]*x[3]) h[1] = tanh(w[0,0]*x[0] + w[1,0]*x[1] + w[2,0]*x[2] + w[3,0]*x[3]) h[2] = tanh(w[0,0]*x[0] + w[1,0]*x[1] + w[2,0]*x[2] + w[3,0]*x[3]) y_hat = v[0]*h[0] + v[1]*h[1] + v[2]*h[2] I think it should be h[0] = tanh(w[0,0]*x[0] + w[1,0]*x[1] + w[2,0]*x[2] + w[3,0]*x[3]+b[0]) h[1] = tanh(w[0,1]*x[0] + w[1,1]*x[1] + w[2,1]*x[2] + w[3,1]*x[3]+b[1]) h[2] = tanh(w[0,2]*x[0] + w[1,2]*x[1] + w[2,2]*x[2] + w[3,2]*x[3]+b[2]) y_hat = v[0]*h[0] + v[1]*h[1] + v[2]*h[2]+b

Haesun Park  Dec 18, 2016  Jan 13, 2017
PDF
Page 107
Formulas in middle

I think this paragraph: h[0] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) h[1] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) h[2] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) should rather be: h[0] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) h[1] = tanh(w[0, 1] * x[0] + w[1, 1] * x[1] + w[2, 1] * x[2] + w[3, 1] * x[3]) h[2] = tanh(w[0, 2] * x[0] + w[1, 2] * x[1] + w[2, 2] * x[2] + w[3, 2] * x[3])

Note from the Author or Editor:
Indeed that's a pretty clear mistake.

Abraham Louw  Apr 17, 2019
Printed
Page 110
1st paragraph

"If we want a smoother decision boundary, we could add more hidden units (as in Figure 2-49), add a second hidden layer (Figure 2-50)" should be "If we want a smoother decision boundary, we could add more hidden units (as in Figure 2-48), add a second hidden layer (Figure 2-50)"

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 118
3rd paragraph

"you are learning 100 * 1,000 = 100,000 weights from the input to the hidden layer and 1,000 x 1 weights from the hidden layer to the output layer" I think it's better than above: "you are learning 100 * 1,000 = 100,000 weights from the input to the hidden layer and 1,000 * 1 = 1,000 weights from the hidden layer to the output layer"

Note from the Author or Editor:
change "x" to "*"

Haesun Park  Dec 18, 2016  Jan 13, 2017
Printed
Page 118
3nd paragraph

introspect is used instead of inspect

Note from the Author or Editor:
Inspect is probably better here.

Gabriela Hempfling  May 06, 2018  Oct 19, 2018
Printed
Page 119
In[105]

from sklearn.datasets import make_blobs, make_circles should be from sklearn.datasets import make_circles

Haesun Park  Dec 18, 2016  Jan 13, 2017
PDF
Page 142
Applying PCA to the cancer dataset for visualization,1st paragraph

"This dataset has 30 features, which would result in 30 * 14 = 420 scatter plots!" Why did you multiply by 14 ?

Note from the Author or Editor:
It should actually be 29 * 15, I realize now. All possible combinations of features are n * (n-1)/2, so 30 * 29 / 2 = 435. In a scatter matrix, the diagonal is not pairwise plots, and the upper and lower triangle are transposed. So to show all pairwise interactions, we need to plot all the plots in either the upper or lower triangle of the scatter matrix. If we plot the whole scatter matrix, obviously we'd need 30 * 30 many plots.

Anonymous  Aug 25, 2017  Oct 19, 2018
Printed
Page 147
1st paragraph

In end of first sentence, "(it's negative," shoud be "(it's postive" because all features of first component are positive value.

Haesun Park  Jan 04, 2017  Jan 13, 2017
Printed
Page 151
Out[27]

X_train_pca.shape: (1537, 100) should be X_train_pca.shape: (1547, 100)

Haesun Park  Jan 04, 2017  Jan 13, 2017
Printed
Page 154
Last sentence of the 1st paragraph

Following sentence: "Here, we visualize the reconstruction of some faces using 10, 50, 100, 500, or 2,000 components" Should be changed to: "Here, we visualize the reconstruction of some faces using 10, 50, 100, or 500 components"

Haris Memic  Dec 10, 2016  Jan 13, 2017
Printed
Page 163
line 1

"The figure includes 3 of the 100 measurements from X for reference.’’ What is the 'X' here? I could not fine 'X' around here.

Note from the Author or Editor:
"from X for reference" should be "from the mixed measurements X for reference". Page 165, under figure 3-19.

HIDEMOTO NAKADA  Feb 01, 2017  Jun 09, 2017
Printed
Page 164
last sentence

'handwritten digit between 0 and 1.' should be 'handwritten digit between 0 and 9.'

Ricky Park  Jan 05, 2017  Jan 13, 2017
Printed
Page 165
first paragraph

and color each dot by its class in Figure 3-21 , there are no dots, but numbers.

Note from the Author or Editor:
"and color each dot by its class" should be "and represent each sample with a digit corresponding to its class". Page 167, under figure 3-20

HIDEMOTO NAKADA  Feb 01, 2017  Jun 09, 2017
Printed
Page 169
in the code/ in the graph

plt.xlabel("t-SNE feature 0") plt.xlabel("t-SNE feature 1") the second xlabel is overwriting the first and should be ylabel

Note from the Author or Editor:
Correct.

Anonymous  May 06, 2018  Oct 19, 2018
Printed
Page 189
last paragraph

In 2nd line, "If you decrease min_samples, ..." should be "If you increase min_samples, ..." because increasing min_samples makes the points to noise.

Ricky Park  Jan 05, 2017  Jan 13, 2017
Printed
Page 191
2nd paragraph

last sentence, "..., which both provide a quantitative measure between 0 and 1." but actually ARI can return -1 ~ 1.

Note from the Author or Editor:
should be "which both provide a quantitative measure with an optimum of one and a value of zero for unrelated clusterings (though the ARI can become negative)."

Ricky Park  Jan 05, 2017  Jan 13, 2017
Printed
Page 202
Last input box of page

the different one-hot-coding of categorical and integer features should be shown. the preface on the page implies a mapping of one cat to one and only one integer equivalent. the initial definition, i.e. the correct mapping from cats to integers is wrong. Datapoints in cat 'socks' were giving different integer representation, while it should be the same. While the the other side, two different cats ('box','fox') get the same integer representation.

Note from the Author or Editor:
The integer feature and categorical feature were not intended to represent the same feature in this example. However, that's not very clear and the example could certainly do with some explanation.

Anonymous  Oct 02, 2018  Oct 19, 2018
Printed
Page 204
1st paragraph

In this paragraph, the author was talking about agglomerative clustering. But in the last sentence of this paragraph, the author wrote "This is not surprising, given the results of DBSCAN, which tried to cluster all points together. I believe that, in this sentence, "DBSCAN" should be replaced by "agglomerative clustering".

Note from the Author or Editor:
I was trying to related the results of Agglomerative clustering to the results of DBSCAN with this sentence. That would be more clear if it said "given the result of DBSCAN that we observed earlier, which ..."

Jun-Lin Lin  Nov 07, 2017  Oct 19, 2018
Printed
Page 209
Table 3-1

estimator.predict(X_text) should be estimator.predict(X_test)

Ricky Park  Jan 05, 2017  Jan 13, 2017
PDF
Page 211
final paragraph

"In Table 3-1, X_train and y_train refer to the training data and training labels, while X_test and y_test refer to the test data and test labels (if applicable)." However y_test is not in the table. It would be if you put the "score" method in the table.

Note from the Author or Editor:
Should be "while X_test refers to the test data (if applicable)."

Anonymous  May 14, 2017  Jun 09, 2017
Printed
Page 221
below Out[12]

"... with feature values -3 to -2.6, ... with feature values from -2.68 to -2.37, and so on." should be "... with feature values -3 to -2.4, ... with feature values from -2.4 to -1.8, and so on."

Haesun Park  Jan 19, 2017  Jun 09, 2017
Other Digital Version
221
the beginning of Section 3.5.2

The following three choices are implemented in scikit-learn ward ... average ... complete... We now have the forth choice single (smallest minimum distance). Maybe it's worthwhile to mention it.

Hanmin Qin  Dec 17, 2018
Printed
Page 228
below Out[24]

In last sentence, "The second column has entries above 20,000 ..." should be "The second row has entries above 20,000 ..."

Haesun Park  Jan 19, 2017  Jun 09, 2017
PDF
Page 234
Input cell 37 and Figure 4-8

Log transformation is applied twice to Poisson data (In[36] and In[37] on page 234 of the PDF version) resulting in the wrong histogram in figure 4-8 (on page 235 of the PDF version). The first line of In[37] should be plt.hist(X_train_log[:, 0], bins=25, color='gray') rather than plt.hist(np.log(X_train_log[:, 0] + 1), bins=25, color='gray') because the data has already been log transformed in In[36].

Note from the Author or Editor:
Fiorst line in code In[37] should be plt.hist(X_train_log[:, 0], bins=25, color='gray')

Adel Rahmani  Nov 10, 2016  Jan 13, 2017
Printed
Page 245
In[55] and below Figure 4-13

in In[55] code, "plt.figure()" can be removed And below Figure 4-13, "The R^2 is -0.03, ..." should be "The R^2 is -0.04, ..."

Haesun Park  Jan 19, 2017  Jun 09, 2017
Other Digital Version
248
In[59]

>>> plt.plot(citibike, linewidth=1) should be changed to: >>> plt.plot(citibike.index.astype("int"), citibike, linewidth=1) in order to draw a graph same as the one shown in the book(with xticks)

Note from the Author or Editor:
Thank you for reporting. This seems to be a change in a recent pandas version. I think the preferred fix is plt.xticks(xticks, xticks.strftime("%a %m-%d"), rotation=90, ha="left") plt.plot(citibike, linewidth=1)

teamclouday  Jan 06, 2019
Printed
Page 249
In[64]

plt.xlabel("Feature magnitude") plt.ylabel("Feature") should be plt.xlabel("Feature name") plt.ylabel("Feature magnitude")

Haesun Park  Jan 19, 2017  Jun 09, 2017
Printed
Page 254
first paragraph line 8.

However, when using cross-validation, each example will be in the training set exactly once: 'the training set' should be 'the test set'

Note from the Author or Editor:
Page 256, first paragraph, line 8, "training set exactly" should be "test set exactly"

HIDEMOTO NAKADA  Feb 01, 2017  Jun 09, 2017
Printed
Page 262
comment in the list for In[20]

# evaluate the SVC on the test set 'test set' should be 'validation set'

Note from the Author or Editor:
"test set" should be "validation set", page 264 In[19], 5th line from bottom.

HIDEMOTO NAKADA  Feb 01, 2017  Jun 09, 2017
Printed
Page 263
last paragraph

(1st edition) "previous code snippet we can see that GridSearchCV selects .." but we build a grid search manually, so it's better to change like: "previous code snippet we can see that grid search selects .."

Haesun Park  Feb 25, 2017  Jun 09, 2017
Printed
Page 266
the last line

The parameters that were found are scored in the .. 'scored' should be 'stored'

Note from the Author or Editor:
Last line on page 268, "scored" should be "stored".

HIDEMOTO NAKADA  Feb 01, 2017  Jun 09, 2017
Printed
Page 273
last paragraph

(1st edition) "As our param_grid contains 36 combinations of parameters, this results in a whopping 36 * 5 * 5 = 900" but actually param_grid has 'rbf' and 'linear' kernel, so total combinations is 36 + 6 = 42 and number of models is 42 * 5 * 5 = 1050

Note from the Author or Editor:
We should use the original, simpler param_grid here. [already entered]

Haesun Park  Feb 25, 2017  Jun 09, 2017
PDF
Page 273
last paragraph

"not that the entry" -> "note that the entry"

Anonymous  May 14, 2017  Jun 09, 2017
Printed
Page 287
paragraph below Out[51]:

For class 1, we get a fairly small recall, and precision is mixed. does not match the table above. It looks like 'recall' and 'precision' are swapped.

Note from the Author or Editor:
"precision" and "recall" should be swapped in that sentence. Page 289 in second print, under Out[53].

HIDEMOTO NAKADA  Feb 01, 2017  Jun 09, 2017
PDF
Page 288
first paragraph

0.13 (vs. 0.89 for the logistic regression) on the "nine" class, for the "not nine” class it is 0.90 vs. 0.99, ----> 0.10 (vs. 0.89 for the logistic regression) on the “nine” class, for the “not nine” class it is 0.91 vs. 0.99,

Note from the Author or Editor:
yes, correct change.

Anonymous  May 23, 2017  Jun 09, 2017
PDF
Page 290
paragraph next to lobster

The paragraph next to the lobster warns us not to use test sets to set decision thresholds. Ironically however, that is what the preceding few pages did. Perhaps a sentence should be added to say that that was simply for ease of demonstration. Or the preceding pages could be reworked to use the training data instead.

Note from the Author or Editor:
The paragraph should start with "For simplicity, we changed the threshold value based on test-set results in the illustration above. In practice, you need to use a hold-out validation set, not the test-set." and then remove the first sentence.

Anonymous  May 23, 2017  Jun 09, 2017
Printed
Page 292
the first paragraph, line 7

Because we need to compute the ROC curve.. 'ROC curve' should be 'precision-recall curve'.

HIDEMOTO NAKADA  Feb 01, 2017  Jun 09, 2017
Printed
Page 295
first paragraph

Recall that because average pre‐ cision is the area under a curve that goes from 0 to 1, average precision always returns a value between 0 (worst) and 1 (best). 'average precision' should be 'AUC' here.

Note from the Author or Editor:
Should be "AUC" for both instances of "average precision".

HIDEMOTO NAKADA  Feb 02, 2017  Jun 09, 2017
Printed
Page 301
python code and 1st paragraph

In the python code "In[69]", the author used a GridSearchCV object with parameter scoring = 'roc_auc'. Consequently, at the last line of the code, the value returned by the "score" method of this GridSearchCV object should be "roc_auc", not not "accuracy". So, I added three lines of codes to get the accuracy using the recommended "gamma=0.01". svc = SVC(gamma=0.01) svc.fit(X_train, y_train) print(svc.score(X_test, y_test)) The accuracy of this model is 0.895, which is worst than the model selected by GridSearchCV object with "scoring = 'accuracy'. Thus, the statement in the last sentence of the 1st paragraph is also incorrect.

Note from the Author or Editor:
The line print("test set AUC") needs to be removed as it's redundant and the line below should say "test set AUC" instead of accuracy. I had a previous behavior of GridSearchCV in mind when I wrote this (that we changed many years ago). The statement indeed is incorrect now and the example needs to be reworked.

Jun-Lin Lin  Nov 07, 2017  Oct 19, 2018
Printed
Page 306
In[3]

(1st edition) In 2nd print sentence, print("Best set score: ....") it's better to change like print("Test set score: ....")

Haesun Park  Feb 25, 2017  Jun 09, 2017
Printed
Page 312
line -2 (in In[16])

# fit the last step fit -> predict

Note from the Author or Editor:
Page 314 in new edition. Should be "# predict using the last step" in In[16]

HIDEMOTO NAKADA  Feb 02, 2017  Jun 09, 2017
PDF
Page 312
last paragraph

The text says "select the most informative of the 10 features' That should be 10,000 features given the example above.

Note from the Author or Editor:
Just fixed that myself.

Joaquin Vanschoren  Mar 05, 2017  Jun 09, 2017
Printed
Page 313
Figure 6-3

In Figure 6-3, last step : below pipe.predict(X') : The chain flow diagram mistakenly starts with " X " instead of " X' ".

Note from the Author or Editor:
Correct, that needs to be fixed.

Aryo Zare  Aug 18, 2019
Page 319
Figure 6-3

In figure 6-3, there seems to be an inconsistency between the first step in the diagram and the call to pipe.predict(). Specifically, the first step in the diagram shows that X (without the apostrophe) is the at the base of the first arrow implying that X (without the apostrophe) is the primary input. On the other hand, pipe.predict(X') and T1.transform(X') take X' as their argument Is this a mistake/inconsistency or am I missing something? Thanks, Joe

Note from the Author or Editor:
Thanks for reporting! It should be X' in the first step as well.

joseph guirguis  Sep 16, 2021
Printed
Page 325
3rd paragraph

(1st edition) Under section "Example Application: Sentiment Analysis of Movie Reviews": "... with a score of 6 or higher are labeled as positive, and the rest as negative." But actually in Maas et al. 2011, "A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. Neutral reviews are not included in the dataset."

Haesun Park  Mar 12, 2017  Jun 09, 2017
PDF
Page 329
In[4]

The book suggests replacing HTML line breaks with spaces, but the data (including in the book) doesn't seem to actually contain these.

Note from the Author or Editor:
It should print the zero-th document, which should include a html space (or any other that does).

Anonymous  May 21, 2017  Jun 09, 2017
PDF
Page 332
2nd paragraph

"LogisticRegresssion" -> "LogisticRegression" (2 s instead of 3 )

Note from the Author or Editor:
LogisticRegresssion should be LogisticRegression as they said.

Anonymous  Oct 06, 2016  Jan 13, 2017
Printed
Page 336
In the tfidf equation

(1st edition) In the middle, tfidf(w, d) = tf log((N+1)/(N_w+1)) + 1 should be tfidf(w, d) = tf * (log((N+1)/(N_w+1)) + 1)

Note from the Author or Editor:
Given that we used * for multiplication in the supervised chapter I agree with the suggested change.

Haesun Park  Apr 28, 2017  Jun 09, 2017
PDF
Page 336
note at bottom of page

The last sentence in your note on page 336 is unclear: "For this to work, you need to set min_df; otherwise, this feature will never be active during training." Considering that you're suggesting manually adding such a feature, it's unclear what would not be "activated" by adding min_df. The documentation for min_df in scikit-learn doesn't yield any obvious answers to this either.

Note from the Author or Editor:
Clarified to " You need to make sure to restrict the vocabulary in some way, otherwise no words will be "out of vocabulary" during training."

Stephen Dewey  May 17, 2017  Jun 09, 2017
Printed
Page 336
2 nd paragraph

tfidf(w,d) = tf log( (N+1)/(NW+1)) +1 # As printed in book Actual implementation in scikit is as below tfidf(w,d) = tf*( log( (N+1)/(NW+1)) +1)

Note from the Author or Editor:
Indeed, what's printed makes little sense.

Chandra Shekhar Singh  Nov 28, 2018
Printed
Page 337
In[23]

(1st edition) TfidfVectorizer(min_df=5, norm=None) dosen't produce features like 'pokemon', 'smallville' etc in Out[24]. Instead use TfidfVectorizer(min_df=5, norm='l2')

Note from the Author or Editor:
remove "norm=None".

Haesun Park  Mar 01, 2017  Jun 09, 2017
PDF
Page 338
final paragraph

"Both classes also apply L2 normalization after computing the tf–idf representation; in other words, they rescale the representation of each document to have Euclidean norm 1." This is advanced terminology that comes out of nowhere. It isn't described in the book and can't (at least easily) be understood from searching other resources either. It would be best to try to explain this further or simplify it.

Note from the Author or Editor:
We saw the L2 norm already in ridge-regression in Chapter 2. Let's say "Euclidean length 1" instead of "Euclidean norm 1" and add a footnote saying "This simply means each row is divided by its sum of squared entries."

Stephen  May 17, 2017  Jun 09, 2017
PDF
Page 339
middle of page

The page reads, "As you can see, there is some improvement when using tf–idf instead of just word counts." However 0.89 is the same as the score we were getting on pages 334-336.

Note from the Author or Editor:
Should be "In this case, the tf-idf transformation had no impact." (though maybe we should remove the norm=None and get back the improvement. I need see in detail if that's worth it. That destroys the interpretation of the highest tf-idf somewhat.

Stephen Dewey  May 17, 2017  Jun 09, 2017
Printed
Page 345
First paragraph

The paragraph describing the topics does not correspond to the topics in the figure. Clearly topic 70 is the most important in the figure.

Andreas C Müller

Jan 12, 2017  Jun 09, 2017
Printed
Page 352
in In[49]

# pshow first two sentences pshow -> show

HIDEMOTO NAKADA  Feb 02, 2017  Jun 09, 2017
Printed
Page 355
last sentence

(1st edition) 'http://papers.nips.cc/paper/5021-di' should be 'https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf'

Haesun Park  Mar 01, 2017  Jun 09, 2017
Printed
Page 358
line -4

that might already increase response time or reduce cost. increase -> decrease

HIDEMOTO NAKADA  Feb 02, 2017  Jun 09, 2017
Printed
Page 362
2nd paragraph

(1st edition) ... the statsmodel package for Python should be ... the statsmodels package for Python

Haesun Park  Mar 12, 2017  Jun 09, 2017