Errata

Errata for Python Data Science Handbook

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
ePub	Page pg 110 NaN: Missing numerical data	pg 110 NaN: Missing numerical data In[8]: vals2.sum(), vals2.min(), vals2.max() Out[8]: (nan, nan, nan) -------------------------------------------------------- before output, pandas 0.23.4 displays: RuntimeWarning: invalid value encountered in reduce	Gregory Sherman	Dec 13, 2018
	Chap 3 Example: Recipe Database	The section mentions downloading recipeitems-latest.json.gz, unfortunately this file no longer contains data when downloaded from the S3 bucket, so the example code cannot be followed along.	Anonymous	Dec 20, 2018
Printed	Page Page 84 The code example in "Binning Data"	This is just an addition to my previous errata report for the same page number. In fact the bins are 19. Because 20 points in the bins array define 19 bins. The idea behind this code example is nice really... but the example itself is messed up, some corner cases are not well thought over.	Peter Petrov	Mar 30, 2022
Printed	Page Stephen Joseph Combining Datasets: Merge and Join	also found in https://jakevdp.github.io/PythonDataScienceHandbook/ This is actually more of a conceptual error. Combining Datasets: Merge and Join: In actuality, it is the Dataset on the Left Side in the pd.merge() function that generally drives the order of the key column in the Resultant Dataset. So it's more often the index on the right that gets discarded, not both. i.e. df1 in df3 = pd.merge(df1, df2) (here the index in df3 will be driven by df1). The confusion arises due to the key column (in this case 'employee') is already sorted in alpha order in df1. Try reversing the position of the datasets i.e. set df3 = pd.merge(df2, df1) (...and you will see the index of df3 driven by df2, not df1!) This same issue is often arises when coding in SQL.	Stephen Joseph	Jul 24, 2022
PDF	Page HOG in Action: A Simple Face Detector, page 508 In[4]	Statement for images = [color.rgb2gray(getattr(data, name)()) for name in imgs_to_use] is not working in 3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] Windows-10-10.0.22621-SP0 scikit-image version: 0.19.2 numpy version: 1.21.5 Error messages are: ValueError Traceback (most recent call last) Input In [5], in <cell line: 6>() 1 from skimage import data, transform 3 imgs_to_use = ['camera', 'text', 'coins', 'moon', 4 'page', 'clock', 'immunohistochemistry', 5 'chelsea', 'coffee', 'hubble_deep_field'] ----> 6 images = [color.rgb2gray(getattr(data, name)()) 7 for name in imgs_to_use] Input In [5], in <listcomp>(.0) 1 from skimage import data, transform 3 imgs_to_use = ['camera', 'text', 'coins', 'moon', 4 'page', 'clock', 'immunohistochemistry', 5 'chelsea', 'coffee', 'hubble_deep_field'] ----> 6 images = [color.rgb2gray(getattr(data, name)()) 7 for name in imgs_to_use] File ~\anaconda3\lib\site-packages\skimage\_shared\utils.py:394, in channel_as_last_axis.__call__.<locals>.fixed_func(args, kwargs) 391 channel_axis = kwargs.get('channel_axis', None) 393 if channel_axis is None: --> 394 return func(args, *kwargs) 396 # TODO: convert scalars to a tuple in anticipation of eventually 397 # supporting a tuple of channel axes. Right now, only an 398 # integer or a single-element tuple is supported, though. 399 if np.isscalar(channel_axis): File ~\anaconda3\lib\site-packages\skimage\color\colorconv.py:875, in rgb2gray(rgb, channel_axis) 834 @channel_as_last_axis(multichannel_output=False) 835 def rgb2gray(rgb, , channel_axis=-1): 836 """Compute luminance of an RGB image. 837 838 Parameters (...) 873 >>> img_gray = rgb2gray(img) 874 """ --> 875 rgb = _prepare_colorarray(rgb) 876 coeffs = np.array([0.2125, 0.7154, 0.0721], dtype=rgb.dtype) 877 return rgb @ coeffs File ~\anaconda3\lib\site-packages\skimage\color\colorconv.py:140, in _prepare_colorarray(arr, force_copy, channel_axis) 137 if arr.shape[channel_axis] != 3: 138 msg = (f'the input array must have size 3 along `channel_axis`, ' 139 f'got {arr.shape}') --> 140 raise ValueError(msg) 142 float_dtype = _supported_float_type(arr.dtype) 143 if float_dtype == np.float32: ValueError: the input array must have size 3 along `channel_axis`, got (512, 512)	Anonymous	Nov 08, 2022
Other Digital Version	xxii 4th paragraph starting with Miniconda	I have no question with any text errors in the Python Data Science Handbook. I can not install Miniconda although I followed the procedure outlined in the book on page xxii. The procedure proposes: mkdir -p ~/miniconda3 curl ttps://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh The screenshot after typing these three lines in my terminal looks as follows: Last login: Sun Dec 10 00:12:10 on console (base) petermatthiessen@Peters-MacBook-Pro ~ % ipython zsh: command not found: ipython (base) petermatthiessen@Peters-MacBook-Pro ~ % shasum -a 256 0c9d8ae96c110230a41c0441d5d486d47b627f59409] Ode52989d01d04d18d8eee shasum: 0c9d8ae96c110230a41c0441d5d486d47b627f594090de52989d01d04d18d8eee: No such file or directory [(base) petermatthiessen@Peters-MacBook-Pro ~ % 0c9d8ae96c110230a41c0441d5d486d47b627f594090de52989d01d04) d18d8ee zsh: command not found: 0c9d8ae96c110230a41c0441d5d486d47b627f594090de52989d01d04d18d8ee (base) petermatthiessen@Peters-MacBook-Pro ~ % shasum -a 256 0c9d8ae96c110230a41c0441d5d486d47b627f59409) 0de52989d01d04d18d8ee shasum: 0c9d8a96c110230a41c0441d5d486d47b627f594090de52989d01d04d18d8ee: No such file or directory [(base) petermatthiessen@Peters-MacBook-Pro ~ % mkdir -p ~/miniconda3 (base) petermatthiessen@Peters-MacBook-Pro ~ % curl ttps://repo.anaconda.com/miniconda/Miniconda3-lates] t-MacOSX-x8664.sh -o ~/miniconda3/miniconda.sh % Total % Received % ferd Average Speed Time Time Time Current Dload Upload 100 378 100 378 Total Spent Left Speed 1382 0--:--:-- --: --:-- 1421 [(base) petermatthiessen@Peters-MacBook-Pro ~ % bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 /Users/petermatthiessen/miniconda/miniconda.Sh: Line 1: Syntax error near unexpected token "newline' /Users/petermatthiessen/miniconda3/miniconda.sh: line 1: (base) petermatthiessen@Peters-MacBook-Pro ~ % bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3\|] Any thoughts how could I successfully install Miniconda? Best regards, Peter M.	Peter Matthiessen	Dec 13, 2023
Printed	Page 11 2nd paragraph	Python interpreter not iPython interpreter	Anonymous	Oct 26, 2019
ePub	Page 28 middle on the page	The command "In[ 9]: %load_ext line_profiler" seems not to work anymore. I installed line_profiler with the pip as indicated on the page. However since I installed anaconda as indicated in the book, I tried conda to install the line_profiler. That worked! Best regards Tony	Tony Hürliamnn	Aug 04, 2019
PDF	Page 42 1st paragraph at the end	df['NO. OBESE'].groupby(d['GRADE LEVEL']).aggregate([sum, mean, std]) should be df['NO. OBESE'].groupby(df['GRADE LEVEL']).aggregate([np.sum, np.mean, np.std]) and any reference to d should be replaced with df in this chapter	vOOda	Aug 06, 2021
Printed	Page 43 Last paragraph	"Comma separated tuplesof indices" should be "comma sepatated list of indices"	Massimiliano volpi	Apr 03, 2023
Printed	Page 46 Section Subarrays as no-copy views	n Python, slices are also no-copies views. Therefore, the sentence «This is one area in which Numpy array slicing differs from Python list slicing.» is wrong. The only difference is when we are using advanced/fancy indexing. In this case, NumPy creates copies.	Ivo Tavares	Nov 18, 2020
ePub	Page 64 Table 2-3. Aggregation functions available in NumPy	np.mean np.nanmean Compute median of elements" --- should be "Compute mean ..."	Gregory Sherman	Dec 12, 2018
PDF	Page 65 Figure 2-4	The equation of the third example shown in the figure should be "np.arange(3)[:, np.newaxis]+np.arange(3)", not "np.ones((3,1))+np.arange(3)".	Anonymous	Jan 26, 2017
Printed	Page 65 Figure 2-4	wrong: np.ones((3, 1) + np.arange(3) correct: np.arange(3).reshape((3, 1)) + np.arange(3) correct: np.arange(3)[:, np.newaxis] + np.arange(3)	Anonymous	Nov 02, 2019
Printed	Page 65 Figure 2.4	import numpy as np np.ones((3, 1)) + np.arange(3) Outcome should be: array([[1., 2., 3.], [1., 2., 3.], [1., 2., 3.]])	André Roukema	Aug 14, 2022
PDF	Page 65 Figure 2-4	in figure 2-4 of page 65 third figure fist box must contain 1. 1. 1. 1. 1. 1. 1. 1. 1. and last box must contain 1. 2. 3. 1. 2. 3. 1. 2. 3.	Anonymous	Jan 06, 2023
PDF	Page 75 In code snippet	print("Rainy days with < 0.1 inches :", np.sum((inches > 0) & (inches < 0.2))) The screen prints "Rainy days with < 0.1 inches" while the program calculating rainy days with < 0.2 inches.	Anonymous	Feb 01, 2017
Printed	Page 75 5th paragraph	wrong: np.sum((inches > 0) & (inches < 0.2)) correct: np.sum((inches > 0) & (inches < 0.1))	Anonymous	Nov 02, 2019
Printed	Page 75 2nd paragraph	where reads: "...the equivalence of A AND B and NOT (A OR B)..." should be read: "...the equivalence of A AND B and NOT ((NOT A) OR (NOT B))..."	Pedro Sousa	Feb 04, 2020
ePub	Page 80 Example: Binning Data	For example, imagine we have 1,000 values . . . x = np.random.randn(100) --- Both should be 100 or 1000	Gregory Sherman	Dec 12, 2018
PDF	Page 82 In[17]	In[17]: plt.scatter(X[:, 0], X[:, 1], alpha=0.3) plt.scatter(selection[:, 0], selection[:, 1], facecolor='none', s=200); The code above won't show the large circles on the plot, as it is missing "edgecolor" . It is corrected by the following code: In[17]: plt.scatter(selection[:, 0], selection[:, 1], facecolor='none', edgecolor='b', s=200);	utjo3105	Apr 08, 2017
Printed	Page 84 The code example in "Binning Data"	Actually np.searchsorted(bins, x) may return an array which contains an index equal to 20 (the number of bins). Then np.add.at(counts, i, 1) will raise an error. This problem doesn't happen only because [-5,5] is a large interval and we're lucky that np.random.randn(100) didn't return any number bigger than 5. Of course the probability that np.random.randn(100) would return a number larger than 5 is small, but it's not zero. How to prove there's a problem? Say we try the same example using bins = np.linspace(-1.5, 1.5, 20) instead of bins = np.linspace(-5, 5, 20) Then the problem does manifest itself.	Peter Petrov	Mar 30, 2022
PDF	Page 89 First line of code	In[14]: X = rand.rand(10, 2) Should be In[14]: X = np.random.random((10, 2))	Anonymous	Mar 10, 2019
PDF	Page 93 3rd paragraph	>Using the equivalence of A AND B and NOT (A OR B) This is not equivalent. A = true B = true A and B = true NOT (A OR B) = NOT (true) = false The example that follows is also incorrect since it is based off that. The wanted equivalence is, I suppose, A and B = NOT (NOT (A) OR NOT (B)) by the Morgan's law	Anonymous	May 27, 2019
ePub	Page 94 Series as specialized dictionary	In[11]: population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135} population = pd.Series(population_dict) population Out[11]: California 38332521 Florida 19552860 Illinois 12882135 New York 19651127 Texas 26448193 dtype: int64 ---- Using version 0.23.4, the keys are output in the same order as in the dictionary (not alphabetized), so "In[13]: population['California':'Illinois']" retrieves the entire series rather than the first three elements (ordered alphabetically by keys : 'California', 'Florida', 'Illinois') The same issue is seen in later examples	Gregory Sherman	Dec 12, 2018
Printed	Page 94 top of page	Wouldn't it be better to say # Get first element of data # Get first tuple of data Because data is a 1D-array and has no rows like a 2D-array. The term "row" is misleading here because it implies that we have to do with a 2D data structure, which is not the case in my opinion.	Andrea P. Mathis	Nov 05, 2019
PDF	Page 95 Second sentence, first paragraph	I think it is more clear to state that characters "<" and ">" are used to specify the ordering convention for significant "bytes" instead of "bits".	Anonymous	Feb 22, 2017
ePub	Page 103 DataFrame as two-dimensional array	The ix indexer allows a hybrid of these two approaches: In[30]: data.ix[:3, :'pop'] -------------------------------------------------------- In pandas 0.23.4, this results in: "... DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing" plus the Californis, Texas, and New York rows	Gregory Sherman	Dec 13, 2018
PDF	Page 104 1st paragraph	The related subsection title at the bottom page 103 is "DataFrame as specialized dictionay". However, explanation in the first paragraph of page 104 contains as followings: "Because of this, it is probably better to think about DataFrames as generalized dictionaries rather than generalized arrays, though both ways of looking at the situation can be useful." The subsection title and its corresponding explanation is somewhat conflicting: specialized vs. generalized. One of them should be corrected for consistency.	Hongsoog Kim	Aug 21, 2017
PDF, ePub	Page 111 2nd Code Block	Use of 'axis' instead of 'axes'.	David	May 22, 2017
ePub	Page 117 Explicit MultiIndex constructors	Similarly, you can construct the MultiIndex directly using its internal encoding by passing levels (a list of lists containing available index values for each level) and labels (a list of lists that reference these labels): ----------------------------------------------------------------------------------------------------------------- It seems that another word should be in place of the final one ("labels")	Gregory Sherman	Dec 19, 2018
ePub	Page 126 pg 126	n[8]: df3 = make_df('AB', [0, 1]) df4 = make_df('CD', [0, 1]) print(df3); print(df4); print(pd.concat([df3, df4], axis='col')) --------------------------------------------------------------------------------- pd.concat() fails ; needs to be "axis='columns'"	Gregory Sherman	Dec 19, 2018
ePub	Page 128 Concatenation with joins	In[13]: df5 = make_df('ABC', [1, 2]) df6 = make_df('BCD', [3, 4]) print(df5); print(df6); print(pd.concat([df5, df6]) --------------------------------------------------------------------- missing closing parenthesis at end of last print() call Before third print() output: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'.	Gregory Sherman	Dec 19, 2018
	130 Final paragraph on the page	The sentence reads "Seeing this, you might wonder why would we would bother withhierarchical indexing at all." I believe that it should be "you might wonder why we would bother" rather than "you might wonder why would we would bother."	sterlinm	Jun 01, 2017
Printed	Page 151 Code below 2nd paragraph	code in the book is different than that on the website on page 151, the 6th line of code including the commented line from the book (which will not work): <p style= 'font-family:"Courier New", Courier, monospace'>{0}{1} """ code from the website (which will work): <p style='font-family:"Courier New", Courier, monospace'>{0}</p>{1} </div>"""	David Walden	Jan 18, 2024
ePub	Page 159 In[12]	monte.str.findall(r'^[^AEIOU].*[^aeiou]$') -------------------------------------------------------- The case of the vowels in the regular expression makes no difference. If the first and second character classes are switched, the results are identical. So, it appears that matching is case-insensitive by default (unlike Python's re) - can it be made case-sensitive?	Gregory Sherman	Dec 25, 2018
Printed	Page 164 2nd paragraph	planets.groupby('method')['year'].describe().unstack() I think the unstack method can be ommited to get the DataFrame. Applying the unstack method yields to a (multiindexed) Series in my opinion.	Andrea P. Mathis	Nov 17, 2019
Printed	Page 164-165 last paragraph of 164	In the Dispatch methods, in the code "planets.groupby('method')['year'].describe().unstack()", calling the unstack method (using parenthesis) returns a 'pandas.core.series.Series' whereas it should return "<bound method DataFrame.unstack of method". Therefore, the parenthesis should be omitted in unstack to get desired result. Correct code : planets.groupby('method')['year'].describe().unstack	Minhaz Uddin	Jul 31, 2022
Printed	Page 164	'Iteration over groups' Unclear to me. Missing details.	Holger Eich	Mar 31, 2024
Printed	Page 170 2nd paragaph	In the section Pivot Tables, the sentence " We have seen how the GroupBy abstraction let us..........", it should be "We have seen how the GroupBy abstraction work, let us.........."	Minhaz Uddin	Jul 31, 2022
Printed	Page 197 In[25], In[26], In[28]	This: In[25]: from pandas_datareader import data goog = data.DataReader('GOOG', start='2004', end='2016', data_source='google' goog.head( ) In[26]: goog = goog['Close'] In[27]: %matplotlib inline import matplotlib.pyplot as plt import seaborn; seaborn.set() In[28]: goog.plot( ) Should be: In[25]: from pandas_datareader import data aapl = data.DataReader('AAPL', start='2004', end='2016', data_source='yahoo') aapl.head( ) In[26]: aapl = aapl['Close'] In[27]: %matplotlib inline import matplotlib.pyplot as plt import seaborn; seaborn.set() In[28]: aapl.plot( ) --- Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader. Therefore, financial data should be imported from Yahoo Finance instead.	Dyanne Ahn	Aug 14, 2020
Printed	Page 198-199 In[29], In[30]	This: In[29]: goog.plot(alpha=0.5, style='-') goog.resample('BA').mean().plot(style=':') goog.asfreq('BA').plot(style='--'); plt.legend(['input', 'resample', 'asfreq'], loc='upper left'); In[30]: fig, ax = plt.subplots(2, sharex=True) data = goog.iloc[:10] data.asfreq('D').plot(ax=ax[0], marker='o') data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o') data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o') ax[1].legend(["back-fill", "forward-fill"]); Should be: In[29]: aapl.plot(alpha=0.5, style='-') aapl.resample('BA').mean().plot(style=':') aapl.asfreq('BA').plot(style='--'); plt.legend(['input', 'resample', 'asfreq'], loc='upper left'); In[30]: fig, ax = plt.subplots(2, sharex=True) data = aapl.iloc[:10] data.asfreq('D').plot(ax=ax[0], marker='o') data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o') data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o') ax[1].legend(["back-fill", "forward-fill"]); --- Name 'goog' is not defined because Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader. Therefore, financial data should be imported from Yahoo Finance instead.	Dyanne Ahn	Aug 14, 2020
Printed	Page 199-200 In[31], In[32]	This: In[31]: fig, ax = plt.subplots(3, sharey=True) goog = goog.asfreq('D', method='pad') goog.plot(ax=ax[0]) goog.shift(900).plot(ax=ax[1]) goog.tshift(900).plot(ax=ax[2]) local_max = pd.to_datetime('2007-11-05') offset = pd.Timedelta(900, 'D') ax[0].legend(['input'], loc=2) ax[0].get_xticklabels()[2].set(weight='heavy', color='red') ax[0].axvline(local_max, alpha=0.3, color='red') ax[1].legend(['shift(900)'], loc=2) ax[1].get_xticklabels()[2].set(weight='heavy', color='red') ax[1].axvline(local_max + offset, alpha=0.3, color='red') ax[2].legend(['tshift(900)'], loc=2) ax[2].get_xticklabels()[1].set(weight='heavy', color='red') ax[2].axvline(local_max + offset, alpha=0.3, color='red'); In[32]: ROI = 100 * (goog.tshift(-365) / goog - 1) ROI.plot() plt.ylabel('% Return on Investment'); Should be: In[31]: fig, ax = plt.subplots(3, sharey=True) aapl = aapl.asfreq('D', method='pad') aapl.plot(ax=ax[0]) aapl.shift(900).plot(ax=ax[1]) aapl.tshift(900).plot(ax=ax[2]) local_max = pd.to_datetime('2007-11-05') offset = pd.Timedelta(900, 'D') ax[0].legend(['input'], loc=2) ax[0].get_xticklabels()[2].set(weight='heavy', color='red') ax[0].axvline(local_max, alpha=0.3, color='red') ax[1].legend(['shift(900)'], loc=2) ax[1].get_xticklabels()[2].set(weight='heavy', color='red') ax[1].axvline(local_max + offset, alpha=0.3, color='red') ax[2].legend(['tshift(900)'], loc=2) ax[2].get_xticklabels()[1].set(weight='heavy', color='red') ax[2].axvline(local_max + offset, alpha=0.3, color='red'); In[32]: ROI = 100 * (aapl.tshift(-365) / aapl - 1) ROI.plot() plt.ylabel('% Return on Investment'); --- Name 'goog' is not defined because Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader. Therefore, financial data should be imported from Yahoo Finance instead.	Dyanne Ahn	Aug 14, 2020
PDF	Page 201 2nd paragraph	In the following paragraph of Rolling Window "Rolling statistics are a third type of time series–specific operation implemented by Pandas. These can be accomplished via the rolling() attribute of Series and Data Frame objects, which returns a view similar to what we saw with the groupby operation (see “Aggregation and Grouping” on page 158).This rolling view makes available a number of aggregation operations by default." 'rolling() attribute' should be corrected to 'rolling() method' source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html	Hongsoog Kim	Aug 31, 2017
Printed	Page 201 In[33]	This: In[33]: rolling = goog.rolling(365, center=True) data = pd.DataFrame({'input': goog, 'one-year rolling_mean': rolling.mean(), 'one-year rolling_std': rolling.std()}) ax = data.plot(style=['-', '--', ':']) ax.lines[0].set_alpha(0.3) Should be: In[33]: rolling = aapl.rolling(365, center=True) data = pd.DataFrame({'input': aapl, 'one-year rolling_mean': rolling.mean(), 'one-year rolling_std': rolling.std()}) ax = data.plot(style=['-', '--', ':']) ax.lines[0].set_alpha(0.3) --- Name 'goog' is not defined because Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader. Therefore, financial data should be imported from Yahoo Finance instead.	Dyanne Ahn	Aug 14, 2020
Printed	Page 205 First Sentence / In[41]	The first sentence referenes pd.rolling_mean() but in In[41] line 2 sum() is called instead. In[41] line 3 then sets plt.ylabel to 'mean hourly count' which seems to be in accordance to the first sentence of the text but in opposition to the given code. This entire paragraph seems mixed up. Clarification?	Anonymous	Aug 21, 2017
PDF	Page 245 First code-block.	The code listing: print(df5); print(df6); print(pd.concat([df5, df6]) Lacks a missing parenthesis ')' on the end, to close the print statement.	Nikolaj Gilstrøm	Oct 22, 2020
Printed	Page 263 line 9	plt.axes or fig.add_axes() numbers represent: [left, bottom, width, height] but in the book it's written as [bottom, left, width, height]. The reference here: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.figure.Figure.html	Anonymous	Dec 19, 2019
Printed	Page 283 In[3] 2nd line of code	ax = plt.axes(axisbg='#E6E6E6') should be ax = plt.axes(facecolor='#E6E6E6') Documented on stackoverflow (not by me) at https://stackoverflow.com/questions/50504053/attributeerror-unknown-property-axisbg	Mark Pedigo	Jan 06, 2019
Printed	Page 311 Last paragraph	Seaborn provides an API on top of Matplotlib that offers sane choices....... In this paragraph, instead of 'sane', it would be 'same'.	Minhaz Uddin	Aug 14, 2022
PDF	Page 344 last paragraph	In the last paragraph of page 344: "Often one point of confusion is how the target array differs from the other features columns. The distinguishing feature of the target array is that it is usually the quantity we want to predict from the data: in statistical terms, it is the dependent variable. For example, in the preceding data we may wish to construct a model that can predict the species of flower based on the other measurements; in this case, the species column would be considered the feature." Since the species column contains the value to predict using model, it should be interpreted as target array rather than feature. The last sentence should be corrected as follows: "For example, in the preceding data we may wish to construct a model that can predict the species of flower based on the other measurements; in this case, the species column would be considered the target array."	Hongsoog Kim	Sep 11, 2017
PDF	Page 350 Last code snippet	The last piece of code snippet on book is ---------------------- In[14]: plt.scatter(x, y) plt.plot(xfit, yfit); ---------------------- While the xfit should be Xfit, that is X should be UPPERCASE, or the example code will throw exception. BTW, this book is great, looking forward to the 2nd edition!	Timothy Liu	Jan 14, 2017
PDF	Page 351 1 paragraph of code	In[15]: from sklearn.cross_validation import train_test_split should be replaced with In[15]: from sklearn.model_selection import train_test_split since method train_test_spit is part of the class model_selection not cross_validation !	Ahac	Jun 06, 2020
PDF	Page 358 Code near top of page	In the code: ax.imshow(digits.images[i], cmap='binary', interpolation='nearest') suggest changing 'digits.images[i]' to 'Xtest[i].reshape(8,8)' so that images associated with ytest are displayed. The writing and explanations in this book are clear and of the highest standard. Thank you!	Michael Laszlo	Mar 31, 2017
PDF	Page 363 1st paragraph	In the 1st paragraph of page 363, "Because we have 150 samples, the leave-one-out cross-validation yields scores for 150 trials, and the score indicates either successful (1.0) or unsuccessful (0.0) prediction. Taking the mean of these gives an estimate of the error rate:" The mean of scores should be interpreted as 'estimate of the prediction accuracy' and corrected as follows: "Because we have 150 samples, the leave-one-out cross-validation yields scores for 150 trials, and the score indicates either successful (1.0) or unsuccessful (0.0) prediction. Taking the mean of these gives an estimate of the prediction accuracy:	Hongsoog Kim	Sep 12, 2017
PDF	Page 374 Input 18	Should read `from sklearn.model_selection import GridSearchCV` and not `from sklearn.grid_search import GridSearchCV`	David Lindelof	Oct 25, 2019
PDF	Page 383 2nd equation	The denominator of the LHS of the equation should be P (L2 \| features)	Michele Floris	Jan 20, 2017
Printed	Page 383 Second formula bellow the second paragraph.	The equation for a 2-label bayes classifier on page 383 has an incorrect subindex in the denominator of the left hand term which is 1 while it should be 2. In latex: The original equation: \frac{P(L_{1}\mid features)}{P(L_{1}\mid features)} = \frac{P(features \mid L_{1}) \, P(L_{1})}{P(features \mid L_{2}) \, P(L_{2})} The correct one: \frac{P(L_{1}\mid features)}{P(L_{2}\mid features)} = \frac{P(features \mid L_{1}) \, P(L_{1})}{P(features \mid L_{2}) \, P(L_{2})}	Pablo Lorenzatto	Mar 28, 2017
PDF	Page 410 Code above Fig. 5-57	The `N` Parameter of `plot_svm` is not used, the first line of the function should be changed as: ``` def plot_svm(N=10, ax=None): X, y = make_blobs(n_samples=N, centers=2, random_state=0, cluster_std=0.60) ```	Michele Floris	Apr 03, 2017
PDF	Page 426 Code snippet	bag.fit(X,y) is not needed (model.fit is already included in visualize_classifier defined on page 423)	Michele Floris	Apr 13, 2017