The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Version |
Location |
Description |
Submitted by |
Date submitted |
ePub |
Page pg 110
NaN: Missing numerical data |
pg 110 NaN: Missing numerical data
In[8]: vals2.sum(), vals2.min(), vals2.max()
Out[8]: (nan, nan, nan)
--------------------------------------------------------
before output, pandas 0.23.4 displays:
RuntimeWarning: invalid value encountered in reduce
|
Gregory Sherman |
Dec 13, 2018 |
|
Chap 3
Example: Recipe Database |
The section mentions downloading recipeitems-latest.json.gz, unfortunately this file no longer contains data when downloaded from the S3 bucket, so the example code cannot be followed along.
|
Anonymous |
Dec 20, 2018 |
Printed |
Page Page 84
The code example in "Binning Data" |
This is just an addition to my previous errata report for the same page number. In fact the bins are 19. Because 20 points in the bins array define 19 bins. The idea behind this code example is nice really... but the example itself is messed up, some corner cases are not well thought over.
|
Peter Petrov |
Mar 30, 2022 |
Printed |
Page Stephen Joseph
Combining Datasets: Merge and Join |
also found in https://jakevdp.github.io/PythonDataScienceHandbook/
This is actually more of a conceptual error.
Combining Datasets: Merge and Join:
In actuality, it is the Dataset on the Left Side in the pd.merge() function that generally drives the order of the key column in the Resultant Dataset. So it's more often the index on the right that gets discarded, not both.
i.e. df1 in df3 = pd.merge(df1, df2)
(here the index in df3 will be driven by df1).
The confusion arises due to the key column (in this case 'employee') is
already sorted in alpha order in df1.
Try reversing the position of the datasets
i.e. set df3 = pd.merge(df2, df1)
(...and you will see the index of df3 driven by df2, not df1!)
This same issue is often arises when coding in SQL.
|
Stephen Joseph |
Jul 24, 2022 |
PDF |
Page HOG in Action: A Simple Face Detector, page 508
In[4] |
Statement for
images = [color.rgb2gray(getattr(data, name)())
for name in imgs_to_use]
is not working in
3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.22621-SP0
scikit-image version: 0.19.2
numpy version: 1.21.5
Error messages are:
ValueError Traceback (most recent call last)
Input In [5], in <cell line: 6>()
1 from skimage import data, transform
3 imgs_to_use = ['camera', 'text', 'coins', 'moon',
4 'page', 'clock', 'immunohistochemistry',
5 'chelsea', 'coffee', 'hubble_deep_field']
----> 6 images = [color.rgb2gray(getattr(data, name)())
7 for name in imgs_to_use]
Input In [5], in <listcomp>(.0)
1 from skimage import data, transform
3 imgs_to_use = ['camera', 'text', 'coins', 'moon',
4 'page', 'clock', 'immunohistochemistry',
5 'chelsea', 'coffee', 'hubble_deep_field']
----> 6 images = [color.rgb2gray(getattr(data, name)())
7 for name in imgs_to_use]
File ~\anaconda3\lib\site-packages\skimage\_shared\utils.py:394, in channel_as_last_axis.__call__.<locals>.fixed_func(*args, **kwargs)
391 channel_axis = kwargs.get('channel_axis', None)
393 if channel_axis is None:
--> 394 return func(*args, **kwargs)
396 # TODO: convert scalars to a tuple in anticipation of eventually
397 # supporting a tuple of channel axes. Right now, only an
398 # integer or a single-element tuple is supported, though.
399 if np.isscalar(channel_axis):
File ~\anaconda3\lib\site-packages\skimage\color\colorconv.py:875, in rgb2gray(rgb, channel_axis)
834 @channel_as_last_axis(multichannel_output=False)
835 def rgb2gray(rgb, *, channel_axis=-1):
836 """Compute luminance of an RGB image.
837
838 Parameters
(...)
873 >>> img_gray = rgb2gray(img)
874 """
--> 875 rgb = _prepare_colorarray(rgb)
876 coeffs = np.array([0.2125, 0.7154, 0.0721], dtype=rgb.dtype)
877 return rgb @ coeffs
File ~\anaconda3\lib\site-packages\skimage\color\colorconv.py:140, in _prepare_colorarray(arr, force_copy, channel_axis)
137 if arr.shape[channel_axis] != 3:
138 msg = (f'the input array must have size 3 along `channel_axis`, '
139 f'got {arr.shape}')
--> 140 raise ValueError(msg)
142 float_dtype = _supported_float_type(arr.dtype)
143 if float_dtype == np.float32:
ValueError: the input array must have size 3 along `channel_axis`, got (512, 512)
|
Anonymous |
Nov 08, 2022 |
Printed |
Page 11
2nd paragraph |
Python interpreter not iPython interpreter
|
Anonymous |
Oct 26, 2019 |
ePub |
Page 28
middle on the page |
The command
"In[ 9]: %load_ext line_profiler"
seems not to work anymore. I installed line_profiler with the pip as indicated on the page. However since I installed anaconda as indicated in the book, I tried conda to install the line_profiler.
That worked!
Best regards
Tony
|
Tony Hürliamnn |
Aug 04, 2019 |
PDF |
Page 42
1st paragraph at the end |
df['NO. OBESE'].groupby(d['GRADE LEVEL']).aggregate([sum, mean, std])
should be
df['NO. OBESE'].groupby(df['GRADE LEVEL']).aggregate([np.sum, np.mean, np.std])
and any reference to d should be replaced with df in this chapter
|
vOOda |
Aug 06, 2021 |
Printed |
Page 43
Last paragraph |
"Comma separated tuplesof indices" should be "comma sepatated list of indices"
|
Massimiliano volpi |
Apr 03, 2023 |
Printed |
Page 46
Section Subarrays as no-copy views |
n Python, slices are also no-copies views. Therefore, the sentence «This is one area in which Numpy array slicing differs from Python list slicing.» is wrong.
The only difference is when we are using advanced/fancy indexing. In this case, NumPy creates copies.
|
Ivo Tavares |
Nov 18, 2020 |
ePub |
Page 64
Table 2-3. Aggregation functions available in NumPy |
np.mean np.nanmean Compute median of elements"
---
should be "Compute mean ..."
|
Gregory Sherman |
Dec 12, 2018 |
PDF |
Page 65
Figure 2-4 |
The equation of the third example shown in the figure should be "np.arange(3)[:, np.newaxis]+np.arange(3)", not "np.ones((3,1))+np.arange(3)".
|
Anonymous |
Jan 26, 2017 |
Printed |
Page 65
Figure 2-4 |
wrong: np.ones((3, 1) + np.arange(3)
correct: np.arange(3).reshape((3, 1)) + np.arange(3)
correct: np.arange(3)[:, np.newaxis] + np.arange(3)
|
Anonymous |
Nov 02, 2019 |
Printed |
Page 65
Figure 2.4 |
import numpy as np
np.ones((3, 1)) + np.arange(3)
Outcome should be:
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.]])
|
André Roukema |
Aug 14, 2022 |
PDF |
Page 65
Figure 2-4 |
in figure 2-4 of page 65 third figure fist box must contain
1. 1. 1.
1. 1. 1.
1. 1. 1.
and last box must contain
1. 2. 3.
1. 2. 3.
1. 2. 3.
|
Anonymous |
Jan 06, 2023 |
PDF |
Page 75
In code snippet |
print("Rainy days with < 0.1 inches :", np.sum((inches > 0) &
(inches < 0.2)))
The screen prints "Rainy days with < 0.1 inches" while the program calculating rainy days with < 0.2 inches.
|
Anonymous |
Feb 01, 2017 |
Printed |
Page 75
5th paragraph |
wrong: np.sum((inches > 0) & (inches < 0.2))
correct: np.sum((inches > 0) & (inches < 0.1))
|
Anonymous |
Nov 02, 2019 |
Printed |
Page 75
2nd paragraph |
where reads:
"...the equivalence of A AND B and NOT (A OR B)..."
should be read:
"...the equivalence of A AND B and NOT ((NOT A) OR (NOT B))..."
|
Pedro Sousa |
Feb 04, 2020 |
ePub |
Page 80
Example: Binning Data |
For example, imagine we have 1,000 values
.
.
.
x = np.random.randn(100)
---
Both should be 100 or 1000
|
Gregory Sherman |
Dec 12, 2018 |
PDF |
Page 82
In[17] |
In[17]: plt.scatter(X[:, 0], X[:, 1], alpha=0.3)
plt.scatter(selection[:, 0], selection[:, 1], facecolor='none', s=200);
The code above won't show the large circles on the plot, as it is missing "edgecolor" . It is corrected by the following code:
In[17]: plt.scatter(selection[:, 0], selection[:, 1], facecolor='none', edgecolor='b', s=200);
|
utjo3105 |
Apr 08, 2017 |
Printed |
Page 84
The code example in "Binning Data" |
Actually
np.searchsorted(bins, x)
may return an array which contains an index equal to 20 (the number of bins). Then np.add.at(counts, i, 1) will raise an error.
This problem doesn't happen only because [-5,5] is a large interval and we're lucky that np.random.randn(100) didn't return any number bigger than 5. Of course the probability that np.random.randn(100) would return a number larger than 5 is small, but it's not zero.
How to prove there's a problem?
Say we try the same example using
bins = np.linspace(-1.5, 1.5, 20)
instead of
bins = np.linspace(-5, 5, 20)
Then the problem does manifest itself.
|
Peter Petrov |
Mar 30, 2022 |
PDF |
Page 89
First line of code |
In[14]: X = rand.rand(10, 2)
Should be
In[14]: X = np.random.random((10, 2))
|
Anonymous |
Mar 10, 2019 |
PDF |
Page 93
3rd paragraph |
>Using the equivalence of A AND B and NOT (A OR B)
This is not equivalent.
A = true
B = true
A and B = true
NOT (A OR B) = NOT (true) = false
The example that follows is also incorrect since it is based off that.
The wanted equivalence is, I suppose,
A and B = NOT (NOT (A) OR NOT (B)) by the Morgan's law
|
Anonymous |
May 27, 2019 |
ePub |
Page 94
Series as specialized dictionary |
In[11]: population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population
Out[11]: California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
----
Using version 0.23.4, the keys are output in the same order as in the dictionary
(not alphabetized), so "In[13]: population['California':'Illinois']" retrieves the entire series rather than the first three elements (ordered alphabetically by keys : 'California', 'Florida', 'Illinois')
The same issue is seen in later examples
|
Gregory Sherman |
Dec 12, 2018 |
Printed |
Page 94
top of page |
Wouldn't it be better to say
# Get first element of data
# Get first tuple of data
Because data is a 1D-array and has no rows like a 2D-array.
The term "row" is misleading here because it implies that we have to do with a 2D data structure, which is not the case in my opinion.
|
Andrea P. Mathis |
Nov 05, 2019 |
PDF |
Page 95
Second sentence, first paragraph |
I think it is more clear to state that characters "<" and ">" are used to specify the ordering convention for significant "bytes" instead of "bits".
|
Anonymous |
Feb 22, 2017 |
ePub |
Page 103
DataFrame as two-dimensional array |
The ix indexer allows a hybrid of these two approaches:
In[30]: data.ix[:3, :'pop']
--------------------------------------------------------
In pandas 0.23.4, this results in:
"... DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing"
plus the Californis, Texas, and New York rows
|
Gregory Sherman |
Dec 13, 2018 |
PDF |
Page 104
1st paragraph |
The related subsection title at the bottom page 103 is "DataFrame as specialized dictionay". However, explanation in the first paragraph of page 104 contains as followings:
"Because of this, it is probably better to think about DataFrames as generalized
dictionaries rather than generalized arrays, though both ways of looking at the situation can be useful."
The subsection title and its corresponding explanation is somewhat conflicting: specialized vs. generalized.
One of them should be corrected for consistency.
|
Hongsoog Kim |
Aug 21, 2017 |
PDF, ePub |
Page 111
2nd Code Block |
Use of 'axis' instead of 'axes'.
|
David |
May 22, 2017 |
ePub |
Page 117
Explicit MultiIndex constructors |
Similarly, you can construct the MultiIndex directly using its internal encoding by passing levels (a list of lists containing available index values for each level) and labels (a list of lists that reference these labels):
-----------------------------------------------------------------------------------------------------------------
It seems that another word should be in place of the final one ("labels")
|
Gregory Sherman |
Dec 19, 2018 |
ePub |
Page 126
pg 126 |
n[8]: df3 = make_df('AB', [0, 1])
df4 = make_df('CD', [0, 1])
print(df3); print(df4); print(pd.concat([df3, df4], axis='col'))
---------------------------------------------------------------------------------
pd.concat() fails ; needs to be "axis='columns'"
|
Gregory Sherman |
Dec 19, 2018 |
ePub |
Page 128
Concatenation with joins |
In[13]: df5 = make_df('ABC', [1, 2])
df6 = make_df('BCD', [3, 4])
print(df5); print(df6); print(pd.concat([df5, df6])
---------------------------------------------------------------------
missing closing parenthesis at end of last print() call
Before third print() output:
FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
|
Gregory Sherman |
Dec 19, 2018 |
|
130
Final paragraph on the page |
The sentence reads "Seeing this, you might wonder why would we would bother withhierarchical indexing at all."
I believe that it should be "you might wonder why we would bother" rather than "you might wonder why would we *would* bother."
|
sterlinm |
Jun 01, 2017 |
ePub |
Page 159
In[12] |
monte.str.findall(r'^[^AEIOU].*[^aeiou]$')
--------------------------------------------------------
The case of the vowels in the regular expression makes no difference.
If the first and second character classes are switched, the results are identical.
So, it appears that matching is case-insensitive by default (unlike Python's re) - can it be made case-sensitive?
|
Gregory Sherman |
Dec 25, 2018 |
Printed |
Page 164
2nd paragraph |
planets.groupby('method')['year'].describe().unstack()
I think the unstack method can be ommited to get the DataFrame.
Applying the unstack method yields to a (multiindexed) Series in my opinion.
|
Andrea P. Mathis |
Nov 17, 2019 |
Printed |
Page 164-165
last paragraph of 164 |
In the Dispatch methods, in the code "planets.groupby('method')['year'].describe().unstack()", calling the unstack method (using parenthesis) returns a 'pandas.core.series.Series' whereas it should return "<bound method DataFrame.unstack of method". Therefore, the parenthesis should be omitted in unstack to get desired result.
Correct code : planets.groupby('method')['year'].describe().unstack
|
Minhaz Uddin |
Jul 31, 2022 |
Printed |
Page 170
2nd paragaph |
In the section Pivot Tables, the sentence " We have seen how the GroupBy abstraction let us..........", it should be "We have seen how the GroupBy abstraction work, let us.........."
|
Minhaz Uddin |
Jul 31, 2022 |
Printed |
Page 197
In[25], In[26], In[28] |
This:
In[25]: from pandas_datareader import data
goog = data.DataReader('GOOG', start='2004', end='2016', data_source='google'
goog.head( )
In[26]: goog = goog['Close']
In[27]: %matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
In[28]: goog.plot( )
Should be:
In[25]: from pandas_datareader import data
aapl = data.DataReader('AAPL', start='2004', end='2016', data_source='yahoo')
aapl.head( )
In[26]: aapl = aapl['Close']
In[27]: %matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
In[28]: aapl.plot( )
---
Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader.
Therefore, financial data should be imported from Yahoo Finance instead.
|
Dyanne Ahn |
Aug 14, 2020 |
Printed |
Page 198-199
In[29], In[30] |
This:
In[29]: goog.plot(alpha=0.5, style='-')
goog.resample('BA').mean().plot(style=':')
goog.asfreq('BA').plot(style='--');
plt.legend(['input', 'resample', 'asfreq'],
loc='upper left');
In[30]: fig, ax = plt.subplots(2, sharex=True)
data = goog.iloc[:10]
data.asfreq('D').plot(ax=ax[0], marker='o')
data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o')
data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o')
ax[1].legend(["back-fill", "forward-fill"]);
Should be:
In[29]: aapl.plot(alpha=0.5, style='-')
aapl.resample('BA').mean().plot(style=':')
aapl.asfreq('BA').plot(style='--');
plt.legend(['input', 'resample', 'asfreq'],
loc='upper left');
In[30]: fig, ax = plt.subplots(2, sharex=True)
data = aapl.iloc[:10]
data.asfreq('D').plot(ax=ax[0], marker='o')
data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o')
data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o')
ax[1].legend(["back-fill", "forward-fill"]);
---
Name 'goog' is not defined because Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader.
Therefore, financial data should be imported from Yahoo Finance instead.
|
Dyanne Ahn |
Aug 14, 2020 |
Printed |
Page 199-200
In[31], In[32] |
This:
In[31]: fig, ax = plt.subplots(3, sharey=True)
goog = goog.asfreq('D', method='pad')
goog.plot(ax=ax[0])
goog.shift(900).plot(ax=ax[1])
goog.tshift(900).plot(ax=ax[2])
local_max = pd.to_datetime('2007-11-05')
offset = pd.Timedelta(900, 'D')
ax[0].legend(['input'], loc=2)
ax[0].get_xticklabels()[2].set(weight='heavy', color='red')
ax[0].axvline(local_max, alpha=0.3, color='red')
ax[1].legend(['shift(900)'], loc=2)
ax[1].get_xticklabels()[2].set(weight='heavy', color='red')
ax[1].axvline(local_max + offset, alpha=0.3, color='red')
ax[2].legend(['tshift(900)'], loc=2)
ax[2].get_xticklabels()[1].set(weight='heavy', color='red')
ax[2].axvline(local_max + offset, alpha=0.3, color='red');
In[32]: ROI = 100 * (goog.tshift(-365) / goog - 1)
ROI.plot()
plt.ylabel('% Return on Investment');
Should be:
In[31]: fig, ax = plt.subplots(3, sharey=True)
aapl = aapl.asfreq('D', method='pad')
aapl.plot(ax=ax[0])
aapl.shift(900).plot(ax=ax[1])
aapl.tshift(900).plot(ax=ax[2])
local_max = pd.to_datetime('2007-11-05')
offset = pd.Timedelta(900, 'D')
ax[0].legend(['input'], loc=2)
ax[0].get_xticklabels()[2].set(weight='heavy', color='red')
ax[0].axvline(local_max, alpha=0.3, color='red')
ax[1].legend(['shift(900)'], loc=2)
ax[1].get_xticklabels()[2].set(weight='heavy', color='red')
ax[1].axvline(local_max + offset, alpha=0.3, color='red')
ax[2].legend(['tshift(900)'], loc=2)
ax[2].get_xticklabels()[1].set(weight='heavy', color='red')
ax[2].axvline(local_max + offset, alpha=0.3, color='red');
In[32]: ROI = 100 * (aapl.tshift(-365) / aapl - 1)
ROI.plot()
plt.ylabel('% Return on Investment');
---
Name 'goog' is not defined because Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader.
Therefore, financial data should be imported from Yahoo Finance instead.
|
Dyanne Ahn |
Aug 14, 2020 |
PDF |
Page 201
2nd paragraph |
In the following paragraph of Rolling Window
"Rolling statistics are a third type of time series–specific operation implemented by
Pandas. These can be accomplished via the rolling() attribute of Series and Data
Frame objects, which returns a view similar to what we saw with the groupby operation (see “Aggregation and Grouping” on page 158).This rolling view makes available a number of aggregation operations by default."
'rolling() attribute' should be corrected to 'rolling() method'
source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html
|
Hongsoog Kim |
Aug 31, 2017 |
Printed |
Page 201
In[33] |
This:
In[33]: rolling = goog.rolling(365, center=True)
data = pd.DataFrame({'input': goog,
'one-year rolling_mean': rolling.mean(),
'one-year rolling_std': rolling.std()})
ax = data.plot(style=['-', '--', ':'])
ax.lines[0].set_alpha(0.3)
Should be:
In[33]: rolling = aapl.rolling(365, center=True)
data = pd.DataFrame({'input': aapl,
'one-year rolling_mean': rolling.mean(),
'one-year rolling_std': rolling.std()})
ax = data.plot(style=['-', '--', ':'])
ax.lines[0].set_alpha(0.3)
---
Name 'goog' is not defined because Google Finance has discontinued its API, so this feature has been deprecated in the Pandas DataReader.
Therefore, financial data should be imported from Yahoo Finance instead.
|
Dyanne Ahn |
Aug 14, 2020 |
Printed |
Page 205
First Sentence / In[41] |
The first sentence referenes pd.rolling_mean() but in In[41] line 2 sum() is called instead. In[41] line 3 then sets plt.ylabel to 'mean hourly count' which seems to be in accordance to the first sentence of the text but in opposition to the given code. This entire paragraph seems mixed up. Clarification?
|
Anonymous |
Aug 21, 2017 |
PDF |
Page 245
First code-block. |
The code listing:
print(df5); print(df6); print(pd.concat([df5, df6])
Lacks a missing parenthesis ')' on the end, to close the print statement.
|
Nikolaj Gilstrøm |
Oct 22, 2020 |
Printed |
Page 263
line 9 |
plt.axes or fig.add_axes() numbers represent: [left, bottom, width, height] but in the book it's written as [bottom, left, width, height].
The reference here: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.figure.Figure.html
|
Anonymous |
Dec 19, 2019 |
Printed |
Page 283
In[3] 2nd line of code |
ax = plt.axes(axisbg='#E6E6E6')
should be
ax = plt.axes(facecolor='#E6E6E6')
Documented on stackoverflow (not by me) at https://stackoverflow.com/questions/50504053/attributeerror-unknown-property-axisbg
|
Mark Pedigo |
Jan 06, 2019 |
Printed |
Page 311
Last paragraph |
Seaborn provides an API on top of Matplotlib that offers sane choices.......
In this paragraph, instead of 'sane', it would be 'same'.
|
Minhaz Uddin |
Aug 14, 2022 |
PDF |
Page 344
last paragraph |
In the last paragraph of page 344:
"Often one point of confusion is how the target array differs from the other features
columns. The distinguishing feature of the target array is that it is usually the quantity we want to predict from the data: in statistical terms, it is the dependent variable. For example, in the preceding data we may wish to construct a model that can predict the species of flower based on the other measurements; in this case, the species column would be considered the feature."
Since the species column contains the value to predict using model, it should be interpreted as target array rather than feature. The last sentence should be corrected as follows:
"For example, in the preceding data we may wish to construct a model that can predict the species of flower based on the other measurements; in this case, the species column would be considered the target array."
|
Hongsoog Kim |
Sep 11, 2017 |
PDF |
Page 350
Last code snippet |
The last piece of code snippet on book is
----------------------
In[14]: plt.scatter(x, y)
plt.plot(xfit, yfit);
----------------------
While the xfit should be Xfit, that is X should be UPPERCASE, or the example code will throw exception.
BTW, this book is great, looking forward to the 2nd edition!
|
Timothy Liu |
Jan 14, 2017 |
PDF |
Page 351
1 paragraph of code |
In[15]: from sklearn.cross_validation import train_test_split
should be replaced with
In[15]: from sklearn.model_selection import train_test_split
since method train_test_spit is part of the class model_selection not cross_validation !
|
Ahac |
Jun 06, 2020 |
PDF |
Page 358
Code near top of page |
In the code:
ax.imshow(digits.images[i], cmap='binary', interpolation='nearest')
suggest changing 'digits.images[i]' to 'Xtest[i].reshape(8,8)' so that images associated with ytest are displayed.
The writing and explanations in this book are clear and of the highest standard. Thank you!
|
Michael Laszlo |
Mar 31, 2017 |
PDF |
Page 363
1st paragraph |
In the 1st paragraph of page 363,
"Because we have 150 samples, the leave-one-out cross-validation yields scores for 150 trials, and the score indicates either successful (1.0) or unsuccessful (0.0) prediction. Taking the mean of these gives an estimate of the error rate:"
The mean of scores should be interpreted as 'estimate of the prediction accuracy' and corrected as follows:
"Because we have 150 samples, the leave-one-out cross-validation yields scores for 150 trials, and the score indicates either successful (1.0) or unsuccessful (0.0) prediction. Taking the mean of these gives an estimate of the prediction accuracy:
|
Hongsoog Kim |
Sep 12, 2017 |
PDF |
Page 374
Input 18 |
Should read
`from sklearn.model_selection import GridSearchCV`
and not
`from sklearn.grid_search import GridSearchCV`
|
David Lindelof |
Oct 25, 2019 |
PDF |
Page 383
2nd equation |
The denominator of the LHS of the equation should be P (L2 | features)
|
Michele Floris |
Jan 20, 2017 |
Printed |
Page 383
Second formula bellow the second paragraph. |
The equation for a 2-label bayes classifier on page 383 has an incorrect subindex in the denominator of the left hand term which is 1 while it should be 2.
In latex:
The original equation:
\frac{P(L_{1}\mid features)}{P(L_{1}\mid features)} = \frac{P(features \mid L_{1}) \, P(L_{1})}{P(features \mid L_{2}) \, P(L_{2})}
The correct one:
\frac{P(L_{1}\mid features)}{P(L_{2}\mid features)} = \frac{P(features \mid L_{1}) \, P(L_{1})}{P(features \mid L_{2}) \, P(L_{2})}
|
Pablo Lorenzatto |
Mar 28, 2017 |
PDF |
Page 410
Code above Fig. 5-57 |
The `N` Parameter of `plot_svm` is not used, the first line of the function should be changed as:
```
def plot_svm(N=10, ax=None):
X, y = make_blobs(n_samples=N, centers=2,
random_state=0, cluster_std=0.60)
```
|
Michele Floris |
Apr 03, 2017 |
PDF |
Page 426
Code snippet |
bag.fit(X,y) is not needed (model.fit is already included in visualize_classifier defined on page 423)
|
Michele Floris |
Apr 13, 2017 |