Errata

Introduction to Machine Learning with Python

Errata for Introduction to Machine Learning with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Safari Books Online 7
Figure 7-6. Topic weights learned by LDA

Chapter 7. Working with Text Data Shouldn't the figure 7-6 match the output (first 2 rows of each topic) given by Out[48]? When I run this on my Python those do match. Best André

Anonymous  Jun 25, 2021 
PDF Page 13
under Knowing your data

On Page 13 under Knowing your data, there are 4 questions that you are proposing to answer before modeling. I am able to follow all questions excluding one (given below). .Is there missing data? Should I discard the rows with missing data or handle them differently? Here are you referring to samples those have missing features, and whether or not to discard the samples or treat them differently Using the term 'samples' and 'features' give more clarity to readers instead of rows and data.

Sreejith Nair  Jan 19, 2019 
PDF Page 16
Last paragraph

The shape of the data array is the number of samples multiplied by the number of features. The phrase "multiplied by" is in error because that would mean the shape (150, 4) is 600. but that's the amount of data, not the shape of it. It should say "the number of samples by the number of features" leaving out "multiplied". This mistake also appears on page 22, paragraph 2.

Audrey Julia Walegwa Mbogho  May 09, 2021 
PDF Page 83
bottom

This line of code is throwing a warning: X_train = data_train.date[:, np.newaxis] <ipython-input-22-e8d89f13cbdc>:7: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead. X_train = data_train.date[:, np.newaxis] Please provide an alternative. Thanks!

Anonymous  Nov 06, 2021 
Printed Page 87
Middle paragraph

"The trees that are built as part of the random forest are stored in the estimator_ attribute." 'estimator_' should read 'estimators_'

Anonymous  Jul 14, 2021 
PDF Page 91
bottom

The results of the code do not match the text. The code results are: Accuracy of gbrt on training set 1.000 Accuracy of gbrt on test set 0.965 The results in the text are: Accuracy on training set: 1.000 Accuracy on test set: 0.958

Anonymous  Nov 07, 2021 
PDF Page 103
Out[81] and In[82]

Out[81] in the text does not match the results posted on github Here is the text: Out[81]: Accuracy on training set: 1.00 Accuracy on test set: 0.63 The model overfits quite substantially, with a perfect score on the training set and only 63% accuracy on the test set Here are the github results: Accuracy on training set: 0.90 Accuracy on test set: 0.94 In [82] has a typo - the correction is noted in github but not the text Text: plt.boxplot(X_train, manage_xticks=False) Correct code: plt.boxplot(X_train, manage_ticks=False)

Anonymous  Nov 08, 2021 
PDF Page 105
out[86] and last paragraph

The accuracy of the model in the text does not align with the Github results (or mine either). The conclusions drawn in the text therefore are in error. Here is the text: Accuracy on training set: 0.988 Accuracy on test set: 0.972 Here, increasing C allows us to improve the model significantly, resulting in 97.2% accuracy Here are the results form github: Accuracy on training set: 1.000 Accuracy on test set: 0.958 This will be an awkward one to explain to my students!

Anonymous  Nov 08, 2021 
PDF Page 152
In[24]

Stackexchange suggests that updates to the KNeighborsClassifier on sklearn are invalidating the code. Using older versions though trigger other issues. Please revise! In[24] code is: from sklearn.neighbors import KNeighborsClassifier # split the data in training and test set X_train, X_test, y_train, y_test = train_test_split( X_people, y_people, stratify=y_people, random_state=0) # build a KNeighborsClassifier with using one neighbor: knn = KNeighborsClassifier(n_neighbors=1) knn.fit(X_train, y_train) print("Test set score of 1-nn: {:.2f}".format(knn.score(X_test, y_test))) Out[24] in the text is: Test set score of 1-nn: 0.27 In github the results are: Test set score of 1-nn: 0.23 In my Jupyter notebook its a disaster ValueError Traceback (most recent call last) <ipython-input-64-87c847658059> in <module> 1 from sklearn.neighbors import KNeighborsClassifier 2 # split the data in training and test set ----> 3 X_train, X_test, y_train, y_test = train_test_split( 4 X_people, y_people, stratify=y_people, random_state=0) 5 # build a KNeighborsClassifier with using one neighbor: ~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(test_size, train_size, random_state, shuffle, stratify, *arrays) 2173 2174 n_samples = _num_samples(arrays[0]) -> 2175 n_train, n_test = _validate_shuffle_split(n_samples, test_size, train_size, 2176 default_test_size=0.25) 2177 ~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in _validate_shuffle_split(n_samples, test_size, train_size, default_test_size) 1855 1856 if n_train == 0: -> 1857 raise ValueError( 1858 'With n_samples={}, test_size={} and train_size={}, the ' 1859 'resulting train set will be empty. Adjust any of the ' ValueError: With n_samples=0, test_size=0.25 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

Regis O'Connor  Nov 19, 2021 
Printed Page 212
In[35]

Chinese version "best_parms = {}" I think it should be "best_params = {}" And the part of return has Indentation error.

Alice  Jul 28, 2021