Errata

Errata for Machine Learning with Python Cookbook

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
PDF	Page 44 3rd codeblock	In section "2.4 Loading an Excel File", the 3rd codeblock in "Solution" tab, it is as follows # Load data dataframe = pd.read_excel(url, sheetname=0, header=1) Whereas it should be :- # Load data dataframe = pd.read_excel(url, sheet_name=0, header=1) The underscore(_) in the argument "sheet_name" is missing. Note from the Author or Editor: "pd.read_excel(url, sheetname=0, header=1)" should be "pd.read_excel(url, sheet_name=0, header=1)	Shawn Barar	Mar 06, 2020	Jul 02, 2020
Printed	Page 271 Polynomial kernel equation	This is same errata that I submitted incorrectly. (First Release) K(x_i, x_{i'}) = (1 + \sum_{j=1}^p x_{ij} x_{i' j} )^2 should be K(x_i, x_{i'}) = (r + \gamma \sum_{j=1}^p x_{ij} x_{i' j} )^d	Haesun Park	Aug 15, 2019	Jul 02, 2020
Printed	Page xii 4th bullet	(1st Release) '14.7 Selecting Random Features in Random Forests' should be '14.7 Selecting Important Features in Random Forests',	Haesun Park	Jul 29, 2019	Jul 02, 2020
Printed	Page 297 3rd line	(First Release) (3rd line in p297 and last line in p300) "typically 1" should be "typically 0". Keras initialize bias as 0 by default.	Haesun Park	Jul 18, 2019	Jul 02, 2020
Printed	Page 338 Last code block	(First Release) "joblib.__version__" should be "sklearn.__version__".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 313 1st paragraph	(First Release) "error on both the training set and test set will tend to increase." should be "error on both the training set and test set will tend to decrease." "the training loss continues to increase" should be "the training loss continues to decrease".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 304 Under Discussion	(First Release) "5,000 binary features" should be "1,000 binary features".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 294 Code in Solution	(First Release) "# Create meanshift object" should be "# Create agglomerative clustering object".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 293 1st sentence	(First Release) "# Create meanshift object" should be "# Create DBSCAN object".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 289 3rd paragraph	(First Release) "i.e. 1, 2, and 3" should be "i.e. 0, 1, and 2".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 265 Above Discussion	(First Release) "# Create decision tree classifier object" should be "# Create logistic regression object".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 262 2nd paragraph and code in Solution	(First Release) "MNL" should be "MLR" in 2nd paragraph. "# Create decision tree classifier object" should be "# Create logistic regression object".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 248 No 3	(First Release) "x_i correctly, w_i is increased" should be "x_i correctly, w_i is decreased". "x_i incorrectly, w_i is decreased" should be "x_i correctly, w_i is increased".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 241 1st bullet	(First Release) "Defaults to \sqrt{p} features" should be "Defaults to p features". \sqrt{p} is the default of RandomForestClassifier.	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 240 Code in Solution	(First Release) "# Create random forest classifier object" should be "# Create random forest regressor object".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 236 Code above See Also	(First Release) "# Create dicision tree classifier object using entropy" should be "# Create dicision tree regressor object using mae".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 235 Code in Solution and equations in Discussion	(First Release) "# Create decision tree classifier object" should be "# Create decision tree regressor object" in the code of Solution. "\hat{y}_i" should be "\bar{y}" in MSE eq. "\hat{y}_i" should be "\bar{y}" and "predicted value" should be "mean value" in the last paragraph.	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 234 2nd paragraph	(First Release) "create splits to increase impurity" should be "create splits to decrease impurity".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 228 2nd equation and last paragraph	(First Release) "\hat{\beta_d}x_i^d" should be "\hat{\beta_d}x_1^d" in the 2nd eq. "x_0" shoule be "x[0]" in the last paragraph, because subscript is used for features in the recipe.	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 226 1st paragraph	(First Release) "The effects of sugar and stir" should be "The effects of sugar and stirred".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 218 Last paragraph	(First Release) "we run the same GridSearch" should be "we run the same GridSearchCV".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 217 Code in middle of page	(First Release) "# View best model" should be "# View best n_components".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 192 3rd paragraph	(First Release) "the overall equality of a model" should be "the overall quality of a model".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 182 Code in middle of page	(First Release) "# Cross-validation technique" should be "# Performance metric". "# Use all CPU scores" should be "# Use all CPU cores".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 180 Code above Discussion	(First Release) "# Cross-validation technique" should be "# Performance metric". "# Use all CPU scores" should be "# Use all CPU cores".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 166 2nd paragraph	(First Release) "V is our d x _n feature matrix(...), W is a d x r, and H is an r x n matrix" should be "V is our n x d feature matrix(...), W is a n x r, and H is an r x d matrix".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 162 1st paragraph	(First Release) "define the number of parameters" should be "define the number of components".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 170 Last equation	(First Release) "operatorname" before Var(x) should be removed.	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 142 2nd paragraph	(First Release) You say "check out the external resources at the end of this solution", but there is no external resources in this recipe. I suggest one, bit.ly/2wgbPIS	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 104 Last paragraph	(First Release) "We can use the vocabulary_ method" should be "We can use the get_feature_names method"	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 90 Paragraph below Discussion	(First Release) "the median class of the k nearest observations" should be "the most frequent class of the k nearest observations"	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 30 Above Discussion	(First Release) The ouputs should be changed to table-like format as other recipes in this chapter.	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 82 Last sentence	(First Release) "classes_ method" should be "classes_ attribute"	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 78 Code below Solution	"KNN(k=5, verbose=0).complete(standardized_features)" should be "KNN(k=5, verbose=0).fit_transform(standardized_features)"	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 59 paragraph for Outer	"return all rows in both employee_id and dataframe_sales" should be "return all rows in both dataframe_employee and dataframe_sales"	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 37 Last code block	(First Release) "# Select three rows" should be "# Select four rows"	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 20 Code below Discussion	(First Release) "# Generate three random integer between 1 and 10" should be "# Generate three random integer between 0 and 10"	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 16 Last paragraph	(First Release) "We can use Numpy's dot class to ..." should be "We can use Numpy's dot function to ...".	Haesun Park	Jul 13, 2019	Jul 02, 2020
Printed	Page 10 2nd paragraph, 1st sentence	reshape(-1, 1) should be reshape(1, -1). In the example, it is correctly used.	John Lee	Jun 29, 2019	Jul 02, 2020
ePub	Page 597 Recipe 15.3 Identifying the Best Neighborhood Size	There is no point in standardizing the feature ( features_standardized = standardizer.fit_transform(features) ) explicitly when it will be done in the Pipeline ( pipe = Pipeline([("standardizer", standardizer), ("knn", knn)]) ) Note: The result will remain same (whether we standardize the feature twice or once) due to obvious reasons. Note from the Author or Editor: This is true. "features_standardized = standardizer.fit_transform(features)" can be removed.	Anonymous	Dec 28, 2018	Jul 02, 2020
Printed, PDF	Page 299 Line -3	In the code the 3rd line print('"Standard deviation:", ... must be print("Standard deviation:", ... (The single quote before "Standard ... is too much.)	Frank Langenau	Dec 02, 2018	Jul 02, 2020
Printed	Page 67 code block before "Discussion"	The code interaction = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False) interaction.fit_transform(features) yields an error "unexpected indent" because the last line must not be indended. The code must be as follows: interaction = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False) interaction.fit_transform(features)	Frank Langenau	Sep 18, 2018	Jul 02, 2020
Printed	Page 62 1st line after the formula	In the 1st line after the formula "where x is the feature vector, x'i ist an individual element of feature x, and x'i ist the rescaled element." the first x'i must be only xi (i in all of the x'i and xi formatted as subscript).	Frank Langenau	Sep 17, 2018	Jul 02, 2020
Printed	Page 37 line with "url" in Solution	The URL 'https://tinyurl.com//titanic-csv' must be 'https://tinyurl.com/titanic-csv'. (Only 1 slash before "titanic-csv")	Frank Langenau	Sep 13, 2018	Jul 02, 2020
Printed	Page 29 line -5	The URL 'https://tinyurl.com/simulated_json' must be 'https://tinyurl.com/simulated-json' (The underscore has to be replaced by a minus-sign.)	Frank Langenau	Sep 12, 2018	Jul 02, 2020
Printed	Page 29 4th line	The URL 'https://tinyurl.com/simulated_excel' must be 'https://tinyurl.com/simulated-excel' (The underscore has to be replaced by a minus-sign.)	Frank Langenau	Sep 12, 2018	Jul 02, 2020
Printed	Page 28 First solution on the page, line with "url = ..."	The URL 'https://tinyurl.com/simulated_data' must be 'https://tinyurl.com/simulated-data' (the underscore has to be changed into a minus-sign)	Frank Langenau	Sep 12, 2018	Jul 02, 2020
Printed	Page xi 2nd paragraph	It says "you can copy and paste the code and it'll run). But there is no location given for the source code. Note from the Author or Editor: Most of the code is available on ChrisAlbon.com, and readers have also made repos such as this one: https://github.com/DustinAlandzes/machine-learning-with-python-cookbook-notes	Stephen Austin	Apr 21, 2018	Jul 02, 2020
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 1 Preface	The preface uses a gendered pronoun, "he". Change to "they"	Chris Albon	Apr 02, 2018	Jul 02, 2020