Errata

Errata for Introduction to Machine Learning with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
	safari app (no pages available) Chapter 2, section on Lasso, just after In[39]:	Decreasing alpha to 0.01, we obtain the solution shown as the green dots Should be "red" dots Note from the Author or Editor: "red dots" needs to be replaced with "upward pointing triangle", "shown in teal" should be "shown as circles".	Thierry Herrmann	Oct 08, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 2, section on Naive Bayes Classifiers, subsection Strengths, weaknesses, and parameters, 2nd paragraph	... performs better than BinaryNB shou\ld be "BernoulliNB" Note from the Author or Editor: Should be BernoulliNB indeed.	Thierry Herrmann	Oct 08, 2016	Jan 13, 2017
	safari app (no pages available) in all notebook cells from In[93] up to In[97] and in the text just above In[94]	for scikit-learn 0.18 as mentioned in chapter 1, replace 'algorithm' parameter with 'solver' and 'l-bfgs' with 'lbfgs' Thanks (sorry for duplicate errata submission, got a proxy error from o'reilly at the 1st attempt) Note from the Author or Editor: As described, l-bfgs should always be lbfgs and "algorithm" in the code (or in fixed-width in the text) should always be "solver".	Thierry Herrmann	Oct 10, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 3, section "Applying PCA to the cancer dataset for visualization", just below the graph after In[17]:	"We can also see that the malignant (red) points are more spread out than the benign (blue) points" In the text, 'red' and 'blue' should be swapped to match the graph (or swap the colors in the graph) Note from the Author or Editor: Simply remove "red" and "blue" from the text.	Thierry Herrmann	Oct 13, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 3, section "Eigenfaces for feature extraction", just below Out[28]:	"The input space here is 50×37-pixel grayscale images, so directions within this space are also 50×37-pixel grayscale images" Replace 50x37 with 87x65 since people.images.shape: (3023, 87, 65) (was wondering where 5655 was coming from in pca.components_.shape: (100, 5655)) Note from the Author or Editor: Replace 50x37 with 87x65 everywhere in the text.	Thierry Herrmann	Oct 13, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 4, section on One-Hot-Encoding, code in cell In[2]	data = pd.read_csv( "/home/andy/datasets/adult.data", header=None, index_col=False, names=['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'gender', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'income']) It's unlikely /home/andy/datasets/adult.data will work for people who'll copy/paste the code (also in github as of this writing). The original data set should work: data = pd.read_csv( "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", ...) (or include the dataset in github and use relative path) Note from the Author or Editor: It should be "data/adult.data" instead of "/home/andy/datasets/adult.data"	Thierry Herrmann	Oct 19, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 4, section on Univariate Nonlinear Transformations, text after Out[33]	"The value 2 seems to be the most common, with 62 appearances ..." should be 68 appearances Note from the Author or Editor: 62 should be 68.	Thierry Herrmann	Oct 19, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 4, section "Utilizing Expert Knowledge", text below figure 4.16	"The reason for this is that we encoded day of week and time of day using integers, which are interpreted as categorical variables" should be 'continuous' variables. The next sentence, saying that we do need 'categorical' variables in this case, is correct. Note from the Author or Editor: "categorical" should be "continuous"	Thierry Herrmann	Oct 19, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 5, section "Using Pipelines in Grid Searches", "Illustrating Information Leakage"	Very minor typo: the text mentions: "regression task with 100 samples and 1,000 features" but the code uses 10000 features: "X = rnd.normal(size=(100, 10000))" Note from the Author or Editor: Text should say 10000 features	Thierry Herrmann	Oct 23, 2016	Jan 13, 2017
	safari app (no pages available) Chapter 7, section "Topic Modeling and Document Clustering", text just above In[41]:	Text says: "We’ll remove words that appear in at least 20 percent of the documents, and we’ll limit the bag-of-words model to the 10,000 words that are most common after removing the top 20 percent" but the code uses max_df=.15 Note from the Author or Editor: At the bottom of page 348, before In[41]: "20 percent" in the text should be replaced by "15 percent" for both occurences.	Thierry Herrmann	Oct 25, 2016	Jan 13, 2017
Printed	last paragraph	(1st edition) below Out[34], "kernel parameter is always set to 'rbf' (not that the entry for kernel is a list of length on)" It seems missplled "note", not "not"	Haesun Park	Feb 25, 2017	Jun 09, 2017
	Chapter 2 Predicting Probabilities	"We’ve reproduced this in Figure 2-57, and we encourage youto go though the example there." "through" instead of "though"	Mirwaisse DJANBAZ	Oct 22, 2017	Oct 19, 2018
	Chapter 3 APPLYING PCA TO THE CANCER DATASET FOR VISUALIZATION	"Each plot overlays two histograms, one for all of the points in the benign class (blue) and one for all the points in the malignant class (red)." Blue --> green Red --> blue Note from the Author or Editor: The colors should be removed given the b&w print. The legend should be sufficient explanation. Please remove the parenthesis.	Mirwaisse DJANBAZ	Oct 23, 2017	Oct 19, 2018
Mobi	Page vii last paragraph	The link to "The Elements of Statistical Learning" under the text "the authors’ website." is incorrect. The correct link is https://web.stanford.edu/~hastie/pub.htm Note from the Author or Editor: It should be http://web.stanford.edu/~hastie/ElemStatLearn/	Gabor Szabo	Nov 27, 2017	Oct 19, 2018
ePub	Below figure 2-27	„Following the branches to the right, we see that worst radius <= 16.795 creates a node that contains only 8 benign but 134 malignant samples“ Should be > „Taking a left at the root, for worst radius > 16.795 we end up with 25 malignan“ Should be <= Note from the Author or Editor: Indeed, left is "true" right is "false" so <= and > should be exchanged.	Mile Dragosavac	Dec 01, 2017	Oct 19, 2018
ePub	Below figure 2-29	„meaning we cannot say “a high value of X[0] means class 0, and a low value means class 1” (or vice versa).“ Starting from roots perspective and taking into account X[0] is not relevant for splitting the data, it should be: „meaning we cannot say a high value of X[1] means class 1, and a low value means class 0” (or vice versa).“ Note from the Author or Editor: Indeed should be X[1] instead of X[0]	Mile Dragosavac	Dec 01, 2017	Oct 19, 2018
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 11 bottom of page	Earlier versions of the book were missing "from IPython import display" in the import statements in the note at the bottom of page 11 (top of page 12 in newer versions).	Andreas C MÃƒÂ¼ller	Apr 25, 2017	Jun 09, 2017
PDF	Page 12 1st paragraph under the figure	The text says: With gamma=0.05, performance drastically improves to an AUC of 0.5. In the output, the value is 0.9. As it should be, otherwise the explanation wouldn't make sense :). Note from the Author or Editor: Page 298 first paragraph should be " With gamma=0.05, performance drastically improves to an AUC of 0.9."	Joaquin Vanschoren	Feb 15, 2017	Jun 09, 2017
	Page 13 under Knowing your data	On Page 13 under Knowing your data, there are 4 questions that you are proposing to answer before modeling. I am able to follow all questions excluding one (given below). .Is there missing data? Should I discard the rows with missing data or handle them differently? Here are you referring to samples those have missing features, and whether or not to discard the samples or treat them differently Using the term 'samples' and 'features' give more clarity to readers instead of rows and data. Note from the Author or Editor: Actually, scikit-learn has been criticized for the non-standard use of the word "sample", and data point might be a better word. I think using "row" is also a good word if it is explained as such. I agree saying "data" here is not very clear, though "missing data" is a standard term. Adding a bit of an explanation of what that means would be good in this place, though.	Sreejith Nair	Jan 19, 2019
PDF	Page 14 fourth paragraph	"is the foundation upon which machine learning is BUILD" should be "is the foundation upon which machine learning is BUILT"	A Aziz	Apr 27, 2017	Jun 09, 2017
PDF	Page 16 Jupyter Notebook	In line 5 of 1st paragraph under the topic Jupyter Notebook: The "Jypyter" Notebook makes it easy to incorporate......	Manpreet Singh	Sep 29, 2016	Sep 22, 2016
Printed	Page 16 In[17] and Out[17]	"First five columns" should be "First five raws" Note from the Author or Editor: Should be "First five rows".	HIDEMOTO NAKADA	Feb 10, 2017	Jun 09, 2017
Printed	Page 20 Code Block	If using newer version of Pandas (ie. 0.24.1) the scatter_matrix method is actually inside the package plotting and needs to be called like this: pd.plotting.scatter_matrix(...) Note from the Author or Editor: Which print of the book are you using? This has been corrected in more recent prints.	Cristian Varela	Feb 09, 2019
PDF	Page 34 4th paragraph	The book references "91 possible combinations of two features within those 13" and further clarifies in the foot note to use "13 choose 2" . 13 choose 2 is 78, 14 choose 2 is 91. Note from the Author or Editor: The main text should be "91 possible combinations of two features within those 13 (with replacement)" The footnote should say "This is 13 interactions for the first feature, plus 12 for the second not involving the first, plus 11 for the third on so on. 13 + 12 + 11 + ... + 1 = 91"	Mike Hancock	Oct 18, 2016	Jan 13, 2017
PDF	Page 40 Paragraph below figure	In "In other words, using few neighbors corresponds to high model com‐ plexity (as shown on the right side of Figure 2-1), and using many neighbors corre‐ sponds to low model complexity (as shown on the left side of Figure 2-1)" left and right should be reversed.	Andreas Mueller	Jan 18, 2017	Jun 09, 2017
PDF,	Page 45 1st paragram of section "Linear models for regression"	"For regression, the general prediction formula for a linear model looks as follows: ŷ = w[0] * x[0] + w[1] * x[1] + ... + w[p] * x[p] + b Here, x[0] to x[p] denotes the features (in this example, the number of features is p)..." There are p+1 features in total. Note from the Author or Editor: Should be "the number of features is p+1"	Anonymous	Nov 11, 2016	Jan 13, 2017
Printed	Page 47 The first paragraph of "Linear regression (aka ordinary least squares)"	"The mean squared error is the sum of the squared differences between the predictions and the true values." "mean squared error" is not the sum, but the average of the two values. Note from the Author or Editor: Should be "The mean squared error is the sum of the squared differences between the predictions and the true values, divided by the number of samples."	HIDEMOTO NAKADA	Feb 11, 2017	Jun 09, 2017
Printed	Page 48,53,54	1st edition, 1st release p48, above In[29], "506 samples and 105 derived features." --> "506 samples and 104 derived features." p 53, under Out[36], "only 4 of the 105 features." --> "only 4 of the 104 features." p54, under Out[37], "using only 33 of the 105 features." --> "using only 33 of the 104 features." because load_extended_boston() does not have bias term.	Haesun Park	Jul 17, 2017	Oct 19, 2018
Printed	Page 49 footnote	(1st edition) In footnote about L2 regularization, "Ridge penalizes the L2 norm of coefficients" should be "Ridge penalizes the squared L2 norm of coefficients"	Haesun Park	Apr 28, 2017	Jun 09, 2017
PDF	Page 52 paragraph starts with 'Here, alpha=0.1 '	"Here, alpha=0.1 seems to be working well. We could try decreasing alpha even more to improve generalization. " 'decreasing' here should be 'increasing' since with larger alpha, we will have stronger regularization and hence better generalization. Note from the Author or Editor: "decreasing" is correct but the sentence is slightly misleading and should be rephrased, to "We could try decreasing alpha even more to improve test-set score."	Hidemoto Nakada	Jun 17, 2019
PDF,	Page 55 1st paragraph (beneath the plot)	"Using alpha=0.00001, we get a model that is quite unregularized,..." Should be: "...alpha=0.0001..."	Anonymous	Nov 17, 2016	Jan 13, 2017
Printed	Page 58 3rd paragraph	"Most of the points in class 0 are at the top, and most of the points in class 1 are ath the bottom" should be "Most of the points in class 0 are at the bottom, , and most of the points in class 1 are ath the top" Note from the Author or Editor: bottom and top should be exchanged.	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 58 Above Fig. 2-16	(1st edition) In last setence above Fig. 2-16 "Here is an illustration using SVC" is better than "Here is an illustration using LinearSVC", because plot_linear_svc_regularization() use SVC. Note from the Author or Editor: I changed the code to use LinearSVC which makes more sense at this point in the book.	Haesun Park	Apr 28, 2017	Jun 09, 2017
Printed	Page 59 2nd paragraph	"Let's analyze LinearLogistic in more detail .." should be "Let's analyze LogisticRegression in more detail .."	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed, PDF	Page 60 Code in listing In[45]:	Value of C in label should be 0.01 i.e. label="C=0.01" instead of label="C=0.001" in the below line: plt.plot(logreg001.coef_.T, 'v', label="C=0.001") Note from the Author or Editor: Thanks! Good catch.	Anonymous	Feb 05, 2019
Printed	Page 63 Figure 2-17	the X-axis label should be 'Feature', instead of 'Coefficient index'. The same for Figure 2-18. The codes that generate these figures also need fix.	HIDEMOTO NAKADA	Mar 31, 2017	Jun 09, 2017
Printed	Page 72 1st paragraph	"Splitting the dataset vertically at x[1]=0.0596 yields the most information; it best separates the points in class 1 from the points in class 2." should be "Splitting the dataset horizontally at x[1]=0.0596 yields the most information; it best separates the points in class 0 from the points in class 1."	Haesun Park	Dec 18, 2016	Jan 13, 2017
ePub	Page 72	„The top node, also called the root, represents the whole dataset, consisting of 75 points belonging to class 0 and 75 points belonging to class 1“ Should be 50 points to each class. Dataset consists of 100 points. The right part of figure 2-24 shows the root having 50 points in each class. Note from the Author or Editor: 75->50	Mile Dragosavac	Dec 01, 2017	Oct 19, 2018
Printed	Page 77 1st paragraph, 2nd paragraph	1st paragraph: "The `n_samples` shown in each node in Figure 2-27 gives the number of samples in that node,.." should be "The `samples` shown in each node in Figure 2-27 gives the number of samples in that node,.." 2st paragraph: "Nearly all of the benign samples end up in the second leaf from the right,.." should be "Nearly all of the benign samples end up in the second leaf from the left,.." Note from the Author or Editor: Remove "n_" before samples.	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 77 Above "Feature importance in trees" section	(1st edition) In 1st, 2nd paragraphs, I think that <=, > signs are flipped. "worst radius <= 16.795 creates a node that contains only 8 benign but 134 malignant samples. ... for worst radius > 16.795 we end up with 25 malignant and 259 benign samples" should be "worst radius > 16.795 creates a node that contains only 8 benign but 134 malignant samples. ... for worst radius <= 16.795 we end up with 25 malignant and 259 benign samples" Note from the Author or Editor: Confirmed also in the newest print.	Haesun Park	Apr 28, 2017	Jun 09, 2017
Printed	Page 78 2nd paragraph	"However, if a feature has a low feature_importance,.." should be "However, if a feature has a low feature_importance_,.." Note from the Author or Editor: It should say "has a low value in feature_importance_" with a trailing underscore.	Haesun Park	Dec 18, 2016	Jan 13, 2017
ePub	Page 78.9 Near Figure 2-5	In explaining Figure 2-5, the authors switch from describing the new data points as stars to crosses. It is very confusing. I think the authors meant to say that the new data points are stars. The authors say that but then go on to mention crosses in the figure. Note from the Author or Editor: should be stars everywhere indeed.	Anonymous	Dec 19, 2018
Printed	Page 80 line 1	"a high value of X[0] means class 0, and a low value means class 1" X[0] should be X[1]. Note from the Author or Editor: Page 82 line 1, "X[0]" should be "X[1]"	Hidemoto Nakada	Feb 01, 2017	Jun 09, 2017
	Page 83 bottom	This line of code is throwing a warning: X_train = data_train.date[:, np.newaxis] <ipython-input-22-e8d89f13cbdc>:7: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead. X_train = data_train.date[:, np.newaxis] Please provide an alternative. Thanks! Note from the Author or Editor: This should be replaced by `data_train.date.reshape(-1, 1)`.	Anonymous	Nov 06, 2021
	Page 87 Middle paragraph	"The trees that are built as part of the random forest are stored in the estimator_ attribute." 'estimator_' should read 'estimators_' Note from the Author or Editor: Indeed, thank you!	Anonymous	Jul 14, 2021
Printed	Page 88 5st paragraph	"max_features=sqrt(n_features) for classification and max_features=log2(n_features) for regression" In RandomForestRegressor, max_features default is n_features not log2(n_features) Note from the Author or Editor: It should say "max_features=n_features for regression"	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 92 4th paragraph	"You can find the details in Chapter 1 of Hastie, Tibshirani, and Friedman's The Elements of Statistical Learning" should be "You can find the details in Chapter 12 of Hastie, Tibshirani, and Friedman's The Elements of Statistical Learning" Note from the Author or Editor: It should be "Chapter 12"	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 94 In[78]	first comment: "# add the squared first feature" should be "# add the squared second feature"	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 95 last sentence	ax.set_zlabel("feature0 2") shold be ax.set_zlabel("feature1 2") Note from the Author or Editor: I think that is a duplicate, but I'm not sure if this location was reported before.	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 95 last line	The last line of In[79] should be: ax.set_zlabel("feature1 2") instead of: ax.set_zlabel("feature0 2")	Jess D	Dec 30, 2016	Jan 13, 2017
Printed	Page 98 equation in the middle	k_rbf(x_1, x_2) = exp(\gamma\|\|x_1 - x_2\|\|^2) should be k_rbf(x_1, x_2) = exp(-\gamma\|\|x_1 - x_2\|\|^2) Note from the Author or Editor: missing minus sign	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 100 2nd paragraph	"Increasing `C`, as shown on the bottom right, allows these points to have a stronger influence on the model and makes the decision boundary bend to correctly classify them." I think "Increasing `C`, as shown on the bottom left, ..." is better. Note from the Author or Editor: "bottom right" should be replaced by "bottom left"	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 102 In Preprocessing for data for SVMs	The book disagrees with the sklearn website on how to scale for SVMs. It should be explained more clearly that the right choice of scaling depends on data and model, and StandardScaler would also be a valid approach.	Andreas C MÃƒÂ¼ller	Oct 30, 2017	Oct 19, 2018
Printed	Page 107 equation in the middle	h[0] = tanh(w[0,0]x[0] + w[1,0]x[1] + w[2,0]x[2] + w[3,0]x[3]) h[1] = tanh(w[0,0]x[0] + w[1,0]x[1] + w[2,0]x[2] + w[3,0]x[3]) h[2] = tanh(w[0,0]x[0] + w[1,0]x[1] + w[2,0]x[2] + w[3,0]x[3]) y_hat = v[0]h[0] + v[1]h[1] + v[2]h[2] I think it should be h[0] = tanh(w[0,0]x[0] + w[1,0]x[1] + w[2,0]x[2] + w[3,0]x[3]+b[0]) h[1] = tanh(w[0,1]x[0] + w[1,1]x[1] + w[2,1]x[2] + w[3,1]x[3]+b[1]) h[2] = tanh(w[0,2]x[0] + w[1,2]x[1] + w[2,2]x[2] + w[3,2]x[3]+b[2]) y_hat = v[0]h[0] + v[1]h[1] + v[2]h[2]+b	Haesun Park	Dec 18, 2016	Jan 13, 2017
PDF	Page 107 Formulas in middle	I think this paragraph: h[0] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) h[1] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) h[2] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) should rather be: h[0] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3]) h[1] = tanh(w[0, 1] * x[0] + w[1, 1] * x[1] + w[2, 1] * x[2] + w[3, 1] * x[3]) h[2] = tanh(w[0, 2] * x[0] + w[1, 2] * x[1] + w[2, 2] * x[2] + w[3, 2] * x[3]) Note from the Author or Editor: Indeed that's a pretty clear mistake.	Abraham Louw	Apr 17, 2019
Printed	Page 110 1st paragraph	"If we want a smoother decision boundary, we could add more hidden units (as in Figure 2-49), add a second hidden layer (Figure 2-50)" should be "If we want a smoother decision boundary, we could add more hidden units (as in Figure 2-48), add a second hidden layer (Figure 2-50)"	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 118 3rd paragraph	"you are learning 100 * 1,000 = 100,000 weights from the input to the hidden layer and 1,000 x 1 weights from the hidden layer to the output layer" I think it's better than above: "you are learning 100 * 1,000 = 100,000 weights from the input to the hidden layer and 1,000 * 1 = 1,000 weights from the hidden layer to the output layer" Note from the Author or Editor: change "x" to "*"	Haesun Park	Dec 18, 2016	Jan 13, 2017
Printed	Page 118 3nd paragraph	introspect is used instead of inspect Note from the Author or Editor: Inspect is probably better here.	Gabriela Hempfling	May 06, 2018	Oct 19, 2018
Printed	Page 119 In[105]	from sklearn.datasets import make_blobs, make_circles should be from sklearn.datasets import make_circles	Haesun Park	Dec 18, 2016	Jan 13, 2017
PDF	Page 142 Applying PCA to the cancer dataset for visualization,1st paragraph	"This dataset has 30 features, which would result in 30 * 14 = 420 scatter plots!" Why did you multiply by 14 ? Note from the Author or Editor: It should actually be 29 * 15, I realize now. All possible combinations of features are n * (n-1)/2, so 30 * 29 / 2 = 435. In a scatter matrix, the diagonal is not pairwise plots, and the upper and lower triangle are transposed. So to show all pairwise interactions, we need to plot all the plots in either the upper or lower triangle of the scatter matrix. If we plot the whole scatter matrix, obviously we'd need 30 * 30 many plots.	Anonymous	Aug 25, 2017	Oct 19, 2018
Printed	Page 147 1st paragraph	In end of first sentence, "(it's negative," shoud be "(it's postive" because all features of first component are positive value.	Haesun Park	Jan 04, 2017	Jan 13, 2017
Printed	Page 151 Out[27]	X_train_pca.shape: (1537, 100) should be X_train_pca.shape: (1547, 100)	Haesun Park	Jan 04, 2017	Jan 13, 2017
Printed	Page 154 Last sentence of the 1st paragraph	Following sentence: "Here, we visualize the reconstruction of some faces using 10, 50, 100, 500, or 2,000 components" Should be changed to: "Here, we visualize the reconstruction of some faces using 10, 50, 100, or 500 components"	Haris Memic	Dec 10, 2016	Jan 13, 2017
Printed	Page 163 line 1	"The figure includes 3 of the 100 measurements from X for reference.’’ What is the 'X' here? I could not fine 'X' around here. Note from the Author or Editor: "from X for reference" should be "from the mixed measurements X for reference". Page 165, under figure 3-19.	HIDEMOTO NAKADA	Feb 01, 2017	Jun 09, 2017
Printed	Page 164 last sentence	'handwritten digit between 0 and 1.' should be 'handwritten digit between 0 and 9.'	Ricky Park	Jan 05, 2017	Jan 13, 2017
Printed	Page 165 first paragraph	and color each dot by its class in Figure 3-21 , there are no dots, but numbers. Note from the Author or Editor: "and color each dot by its class" should be "and represent each sample with a digit corresponding to its class". Page 167, under figure 3-20	HIDEMOTO NAKADA	Feb 01, 2017	Jun 09, 2017
Printed	Page 169 in the code/ in the graph	plt.xlabel("t-SNE feature 0") plt.xlabel("t-SNE feature 1") the second xlabel is overwriting the first and should be ylabel Note from the Author or Editor: Correct.	Anonymous	May 06, 2018	Oct 19, 2018
Printed	Page 189 last paragraph	In 2nd line, "If you decrease min_samples, ..." should be "If you increase min_samples, ..." because increasing min_samples makes the points to noise.	Ricky Park	Jan 05, 2017	Jan 13, 2017
Printed	Page 191 2nd paragraph	last sentence, "..., which both provide a quantitative measure between 0 and 1." but actually ARI can return -1 ~ 1. Note from the Author or Editor: should be "which both provide a quantitative measure with an optimum of one and a value of zero for unrelated clusterings (though the ARI can become negative)."	Ricky Park	Jan 05, 2017	Jan 13, 2017
Printed	Page 202 Last input box of page	the different one-hot-coding of categorical and integer features should be shown. the preface on the page implies a mapping of one cat to one and only one integer equivalent. the initial definition, i.e. the correct mapping from cats to integers is wrong. Datapoints in cat 'socks' were giving different integer representation, while it should be the same. While the the other side, two different cats ('box','fox') get the same integer representation. Note from the Author or Editor: The integer feature and categorical feature were not intended to represent the same feature in this example. However, that's not very clear and the example could certainly do with some explanation.	Anonymous	Oct 02, 2018	Oct 19, 2018
Printed	Page 204 1st paragraph	In this paragraph, the author was talking about agglomerative clustering. But in the last sentence of this paragraph, the author wrote "This is not surprising, given the results of DBSCAN, which tried to cluster all points together. I believe that, in this sentence, "DBSCAN" should be replaced by "agglomerative clustering". Note from the Author or Editor: I was trying to related the results of Agglomerative clustering to the results of DBSCAN with this sentence. That would be more clear if it said "given the result of DBSCAN that we observed earlier, which ..."	Jun-Lin Lin	Nov 07, 2017	Oct 19, 2018
Printed	Page 209 Table 3-1	estimator.predict(X_text) should be estimator.predict(X_test)	Ricky Park	Jan 05, 2017	Jan 13, 2017
PDF	Page 211 final paragraph	"In Table 3-1, X_train and y_train refer to the training data and training labels, while X_test and y_test refer to the test data and test labels (if applicable)." However y_test is not in the table. It would be if you put the "score" method in the table. Note from the Author or Editor: Should be "while X_test refers to the test data (if applicable)."	Anonymous	May 14, 2017	Jun 09, 2017
	Page 212 In[35]	Chinese version "best_parms = {}" I think it should be "best_params = {}" And the part of return has Indentation error. Note from the Author or Editor: This mistake is also present in the English version.	Alice	Jul 28, 2021
Printed	Page 221 below Out[12]	"... with feature values -3 to -2.6, ... with feature values from -2.68 to -2.37, and so on." should be "... with feature values -3 to -2.4, ... with feature values from -2.4 to -1.8, and so on."	Haesun Park	Jan 19, 2017	Jun 09, 2017
Other Digital Version	221 the beginning of Section 3.5.2	The following three choices are implemented in scikit-learn ward ... average ... complete... We now have the forth choice single (smallest minimum distance). Maybe it's worthwhile to mention it.	Hanmin Qin	Dec 17, 2018
Printed	Page 228 below Out[24]	In last sentence, "The second column has entries above 20,000 ..." should be "The second row has entries above 20,000 ..."	Haesun Park	Jan 19, 2017	Jun 09, 2017
PDF	Page 234 Input cell 37 and Figure 4-8	Log transformation is applied twice to Poisson data (In[36] and In[37] on page 234 of the PDF version) resulting in the wrong histogram in figure 4-8 (on page 235 of the PDF version). The first line of In[37] should be plt.hist(X_train_log[:, 0], bins=25, color='gray') rather than plt.hist(np.log(X_train_log[:, 0] + 1), bins=25, color='gray') because the data has already been log transformed in In[36]. Note from the Author or Editor: Fiorst line in code In[37] should be plt.hist(X_train_log[:, 0], bins=25, color='gray')	Adel Rahmani	Nov 10, 2016	Jan 13, 2017
Printed	Page 245 In[55] and below Figure 4-13	in In[55] code, "plt.figure()" can be removed And below Figure 4-13, "The R^2 is -0.03, ..." should be "The R^2 is -0.04, ..."	Haesun Park	Jan 19, 2017	Jun 09, 2017
Other Digital Version	248 In[59]	>>> plt.plot(citibike, linewidth=1) should be changed to: >>> plt.plot(citibike.index.astype("int"), citibike, linewidth=1) in order to draw a graph same as the one shown in the book(with xticks) Note from the Author or Editor: Thank you for reporting. This seems to be a change in a recent pandas version. I think the preferred fix is plt.xticks(xticks, xticks.strftime("%a %m-%d"), rotation=90, ha="left") plt.plot(citibike, linewidth=1)	teamclouday	Jan 06, 2019
Printed	Page 249 In[64]	plt.xlabel("Feature magnitude") plt.ylabel("Feature") should be plt.xlabel("Feature name") plt.ylabel("Feature magnitude")	Haesun Park	Jan 19, 2017	Jun 09, 2017
Printed	Page 254 first paragraph line 8.	However, when using cross-validation, each example will be in the training set exactly once: 'the training set' should be 'the test set' Note from the Author or Editor: Page 256, first paragraph, line 8, "training set exactly" should be "test set exactly"	HIDEMOTO NAKADA	Feb 01, 2017	Jun 09, 2017
Printed	Page 262 comment in the list for In[20]	# evaluate the SVC on the test set 'test set' should be 'validation set' Note from the Author or Editor: "test set" should be "validation set", page 264 In[19], 5th line from bottom.	HIDEMOTO NAKADA	Feb 01, 2017	Jun 09, 2017
Printed	Page 263 last paragraph	(1st edition) "previous code snippet we can see that GridSearchCV selects .." but we build a grid search manually, so it's better to change like: "previous code snippet we can see that grid search selects .."	Haesun Park	Feb 25, 2017	Jun 09, 2017
Printed	Page 266 the last line	The parameters that were found are scored in the .. 'scored' should be 'stored' Note from the Author or Editor: Last line on page 268, "scored" should be "stored".	HIDEMOTO NAKADA	Feb 01, 2017	Jun 09, 2017
Printed	Page 273 last paragraph	(1st edition) "As our param_grid contains 36 combinations of parameters, this results in a whopping 36 * 5 * 5 = 900" but actually param_grid has 'rbf' and 'linear' kernel, so total combinations is 36 + 6 = 42 and number of models is 42 * 5 * 5 = 1050 Note from the Author or Editor: We should use the original, simpler param_grid here. [already entered]	Haesun Park	Feb 25, 2017	Jun 09, 2017
PDF	Page 273 last paragraph	"not that the entry" -> "note that the entry"	Anonymous	May 14, 2017	Jun 09, 2017
Printed	Page 287 paragraph below Out[51]:	For class 1, we get a fairly small recall, and precision is mixed. does not match the table above. It looks like 'recall' and 'precision' are swapped. Note from the Author or Editor: "precision" and "recall" should be swapped in that sentence. Page 289 in second print, under Out[53].	HIDEMOTO NAKADA	Feb 01, 2017	Jun 09, 2017
PDF	Page 288 first paragraph	0.13 (vs. 0.89 for the logistic regression) on the "nine" class, for the "not nine” class it is 0.90 vs. 0.99, ----> 0.10 (vs. 0.89 for the logistic regression) on the “nine” class, for the “not nine” class it is 0.91 vs. 0.99, Note from the Author or Editor: yes, correct change.	Anonymous	May 23, 2017	Jun 09, 2017
PDF	Page 290 paragraph next to lobster	The paragraph next to the lobster warns us not to use test sets to set decision thresholds. Ironically however, that is what the preceding few pages did. Perhaps a sentence should be added to say that that was simply for ease of demonstration. Or the preceding pages could be reworked to use the training data instead. Note from the Author or Editor: The paragraph should start with "For simplicity, we changed the threshold value based on test-set results in the illustration above. In practice, you need to use a hold-out validation set, not the test-set." and then remove the first sentence.	Anonymous	May 23, 2017	Jun 09, 2017
Printed	Page 292 the first paragraph, line 7	Because we need to compute the ROC curve.. 'ROC curve' should be 'precision-recall curve'.	HIDEMOTO NAKADA	Feb 01, 2017	Jun 09, 2017
Printed	Page 295 first paragraph	Recall that because average pre‐ cision is the area under a curve that goes from 0 to 1, average precision always returns a value between 0 (worst) and 1 (best). 'average precision' should be 'AUC' here. Note from the Author or Editor: Should be "AUC" for both instances of "average precision".	HIDEMOTO NAKADA	Feb 02, 2017	Jun 09, 2017
Printed	Page 301 python code and 1st paragraph	In the python code "In[69]", the author used a GridSearchCV object with parameter scoring = 'roc_auc'. Consequently, at the last line of the code, the value returned by the "score" method of this GridSearchCV object should be "roc_auc", not not "accuracy". So, I added three lines of codes to get the accuracy using the recommended "gamma=0.01". svc = SVC(gamma=0.01) svc.fit(X_train, y_train) print(svc.score(X_test, y_test)) The accuracy of this model is 0.895, which is worst than the model selected by GridSearchCV object with "scoring = 'accuracy'. Thus, the statement in the last sentence of the 1st paragraph is also incorrect. Note from the Author or Editor: The line print("test set AUC") needs to be removed as it's redundant and the line below should say "test set AUC" instead of accuracy. I had a previous behavior of GridSearchCV in mind when I wrote this (that we changed many years ago). The statement indeed is incorrect now and the example needs to be reworked.	Jun-Lin Lin	Nov 07, 2017	Oct 19, 2018
Printed	Page 306 In[3]	(1st edition) In 2nd print sentence, print("Best set score: ....") it's better to change like print("Test set score: ....")	Haesun Park	Feb 25, 2017	Jun 09, 2017
Printed	Page 312 line -2 (in In[16])	# fit the last step fit -> predict Note from the Author or Editor: Page 314 in new edition. Should be "# predict using the last step" in In[16]	HIDEMOTO NAKADA	Feb 02, 2017	Jun 09, 2017
PDF	Page 312 last paragraph	The text says "select the most informative of the 10 features' That should be 10,000 features given the example above. Note from the Author or Editor: Just fixed that myself.	Joaquin Vanschoren	Mar 05, 2017	Jun 09, 2017
Printed	Page 313 Figure 6-3	In Figure 6-3, last step : below pipe.predict(X') : The chain flow diagram mistakenly starts with " X " instead of " X' ". Note from the Author or Editor: Correct, that needs to be fixed.	Aryo Zare	Aug 18, 2019
	Page 319 Figure 6-3	In figure 6-3, there seems to be an inconsistency between the first step in the diagram and the call to pipe.predict(). Specifically, the first step in the diagram shows that X (without the apostrophe) is the at the base of the first arrow implying that X (without the apostrophe) is the primary input. On the other hand, pipe.predict(X') and T1.transform(X') take X' as their argument Is this a mistake/inconsistency or am I missing something? Thanks, Joe Note from the Author or Editor: Thanks for reporting! It should be X' in the first step as well.	joseph guirguis	Sep 16, 2021
Printed	Page 325 3rd paragraph	(1st edition) Under section "Example Application: Sentiment Analysis of Movie Reviews": "... with a score of 6 or higher are labeled as positive, and the rest as negative." But actually in Maas et al. 2011, "A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. Neutral reviews are not included in the dataset."	Haesun Park	Mar 12, 2017	Jun 09, 2017
PDF	Page 329 In[4]	The book suggests replacing HTML line breaks with spaces, but the data (including in the book) doesn't seem to actually contain these. Note from the Author or Editor: It should print the zero-th document, which should include a html space (or any other that does).	Anonymous	May 21, 2017	Jun 09, 2017
PDF	Page 332 2nd paragraph	"LogisticRegresssion" -> "LogisticRegression" (2 s instead of 3 ) Note from the Author or Editor: LogisticRegresssion should be LogisticRegression as they said.	Anonymous	Oct 06, 2016	Jan 13, 2017
Printed	Page 336 In the tfidf equation	(1st edition) In the middle, tfidf(w, d) = tf log((N+1)/(N_w+1)) + 1 should be tfidf(w, d) = tf * (log((N+1)/(N_w+1)) + 1) Note from the Author or Editor: Given that we used * for multiplication in the supervised chapter I agree with the suggested change.	Haesun Park	Apr 28, 2017	Jun 09, 2017
PDF	Page 336 note at bottom of page	The last sentence in your note on page 336 is unclear: "For this to work, you need to set min_df; otherwise, this feature will never be active during training." Considering that you're suggesting manually adding such a feature, it's unclear what would not be "activated" by adding min_df. The documentation for min_df in scikit-learn doesn't yield any obvious answers to this either. Note from the Author or Editor: Clarified to " You need to make sure to restrict the vocabulary in some way, otherwise no words will be "out of vocabulary" during training."	Stephen Dewey	May 17, 2017	Jun 09, 2017
Printed	Page 336 2 nd paragraph	tfidf(w,d) = tf log( (N+1)/(NW+1)) +1 # As printed in book Actual implementation in scikit is as below tfidf(w,d) = tf( log( (N+1)/(NW+1)) +1) Note from the Author or Editor:* Indeed, what's printed makes little sense.	Chandra Shekhar Singh	Nov 28, 2018
Printed	Page 337 In[23]	(1st edition) TfidfVectorizer(min_df=5, norm=None) dosen't produce features like 'pokemon', 'smallville' etc in Out[24]. Instead use TfidfVectorizer(min_df=5, norm='l2') Note from the Author or Editor: remove "norm=None".	Haesun Park	Mar 01, 2017	Jun 09, 2017
PDF	Page 338 final paragraph	"Both classes also apply L2 normalization after computing the tf–idf representation; in other words, they rescale the representation of each document to have Euclidean norm 1." This is advanced terminology that comes out of nowhere. It isn't described in the book and can't (at least easily) be understood from searching other resources either. It would be best to try to explain this further or simplify it. Note from the Author or Editor: We saw the L2 norm already in ridge-regression in Chapter 2. Let's say "Euclidean length 1" instead of "Euclidean norm 1" and add a footnote saying "This simply means each row is divided by its sum of squared entries."	Stephen	May 17, 2017	Jun 09, 2017
PDF	Page 339 middle of page	The page reads, "As you can see, there is some improvement when using tf–idf instead of just word counts." However 0.89 is the same as the score we were getting on pages 334-336. Note from the Author or Editor: Should be "In this case, the tf-idf transformation had no impact." (though maybe we should remove the norm=None and get back the improvement. I need see in detail if that's worth it. That destroys the interpretation of the highest tf-idf somewhat.	Stephen Dewey	May 17, 2017	Jun 09, 2017
Printed	Page 345 First paragraph	The paragraph describing the topics does not correspond to the topics in the figure. Clearly topic 70 is the most important in the figure.	Andreas C MÃƒÂ¼ller	Jan 12, 2017	Jun 09, 2017
Printed	Page 352 in In[49]	# pshow first two sentences pshow -> show	HIDEMOTO NAKADA	Feb 02, 2017	Jun 09, 2017
Printed	Page 355 last sentence	(1st edition) 'http://papers.nips.cc/paper/5021-di' should be 'https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf'	Haesun Park	Mar 01, 2017	Jun 09, 2017
Printed	Page 358 line -4	that might already increase response time or reduce cost. increase -> decrease	HIDEMOTO NAKADA	Feb 02, 2017	Jun 09, 2017
Printed	Page 362 2nd paragraph	(1st edition) ... the `statsmodel` package for Python should be ... the `statsmodels` package for Python	Haesun Park	Mar 12, 2017	Jun 09, 2017