Errata

Python for Data Analysis

Errata for Python for Data Analysis

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page Preface - Acknowledgments
Acknowledgments for Third Edition (2022)

Minor typo: "Programmer" is mispelled.

It has more than a decade since I started writing the first edition of this book and more than 15 years since I originally started my journey as a Python prorammer.

Note from the Author or Editor:
I am fixing the typo

Andy Jessen  Sep 16, 2022 
Page Preface
first paragraph

In the first sentence of the preface here:
wesmckinney.com/book/preface.html

it says:
"This Open Access web version of Python for Data Analysis 3rd Edition is now available as a companion to the print and digital editions. If you encounter any errata, please report them here."
The URL for error reporting is: www.oreilly.com/catalog/errata.csp?isbn=0636920023784

that is the wrong URL. The correct URL is: oreilly.com/catalog/0636920519829/errata

Note from the Author or Editor:
will fix

Anonymous  Sep 25, 2022 
Page Ch 10, Data Aggregation and Group Operations
10.3 Quantile and Bucket Analysis

Error:

As you may recall from Ch 8: Data Wrangling: Join, Combine, and Reshape, pandas has some tools, in particular pandas.cut and pandas.qcut, for slicing data up into buckets with bins of your choosing, or by sample quantiles.

Correct:
As you may recall from Ch 7: Data Cleaning and Preparation, pandas has some tools, in particular pandas.cut and pandas.qcut, for slicing data up into buckets with bins of your choosing, or by sample quantiles.

Reason:
pandas.cut and pandas.qcut are discussed in Ch 7 Section 2, Discretization and Binning.

Note from the Author or Editor:
will fix

Young Tan  Sep 26, 2022 
Page "Reading Text Files in Pieces" in 6.1
4th paragraph

"The elipsis marks" should be "The ellipsis marks".

Note from the Author or Editor:
will fix

Noritada Kobayashi  Oct 10, 2022 
Page "Regular Expressions" in 7.4
1st paragraph in p.231

Original:
the match object can only tell us the start and end position of the pattern in the string:

Suggestion for improvement:
the match object can tell us the start and end position of the pattern in the string:

Reason:
As the code block that follows indicates, the string representation of the match object includes information about the matched substring in addition to the start and end positions:

Out[174]: <re.Match object; span=(5, 20), match='dave@google.com'>

Note from the Author or Editor:
will fix

Noritada Kobayashi  Nov 06, 2022 
Page "String Functions in pandas" in 7.4
Table 7-6

Error:
Equivalent to built-in str.alnum

Correct:
Equivalent to built-in str.isalnum

Reason:
See the online documentation of Python.

Note from the Author or Editor:
will fix

Noritada Kobayashi  Nov 12, 2022 
Page Table 6-2, page 257
Argument: skip_footer

In pandas.read_csv(), the argument "skip_footer" has been deprecated.

It's now "skipfooter".

Note from the Author or Editor:
will fix

Anonymous  Nov 26, 2022 
Page "Saving Plots to File" in 9.1
1st paragraph

Error:
You can save the active figure to file using the figure object’s savefig instance method.

Correct:
You can save the figure to file using the figure object’s savefig instance method.

Reason:
In the 2nd ed., the target of the operation was an active figure since the section described `plt.savefig`, but in the 3rd ed., since the `savefig` instance method of a figure object is described, I think the target of the operation does not need to be active.

Note from the Author or Editor:
I am removing "active" from the text

Noritada Kobayashi  Jan 07, 2023 
Page "Saving Plots to File" in 9.1
Table 9-2

Error:
`facecolor, edgecolor`
The color of the figure background outside of the subplots; `"w"` (white), by default.

Correct:
The color of the figure background outside of the subplots; default to `rcParams["savefig.facecolor"]` and `rcParams["savefig.edgecolor"]`, both of which default to `"auto"` (facecolor and edgecolor of the current figure).

Reason:
The default changed from matplotlib 3.3.

Note from the Author or Editor:
I'm removing the part about the default altogether since it's pretty in the weeds

Noritada Kobayashi  Jan 07, 2023 
Page "Quantile and Bucket Analysis" in 10.3
paragraph spanning p. 339 and p. 340

Error:
We can pass `4` as the number of bucket compute sample quartiles, and pass `labels=False` to obtain just the quartile indices instead of intervals:

Suggestion for improvements:
We can pass `4` as the number of bucket to compute sample quartiles, and pass `labels=False` to obtain just the quartile indices instead of intervals:

Reason:
"to" may be missing.

Note from the Author or Editor:
I am adding the missing "to"

Noritada Kobayashi  Jan 14, 2023 
Page "Adding legends" in "Ticks, Labels, and Legends" in 9.1
Blocks above and below Figure 9-10

Error:
In [50]: ax.legend()

The `legend` method has several other choices for the location `loc` argument. See the docstring (with `ax.legend?`) for more information.

The `loc` legend option tells matplotlib where to place the plot. The default is `"best"`, which tries to choose a location that is most out of the way. To exclude one or more elements from the legend, pass no label or `label="_nolegend_"`.

Correct:
In [50]: ax.legend()

The `legend` method can take the `loc` option, which instructs matplotlib where to place the legend in the plot. The `loc` option defaults to `"best"`, which tries to choose a location that is most out of the way. To exclude one or more elements from the legend, pass no label or `label="_nolegend_"`. The `legend` method has several other choices for the `loc` argument. See the docstring (with `ax.legend?`) for more information.

Reason:
In the 2nd ed., the author passed `loc="best"` as the argument to `legend` in the code block 50, so readers could read the subsequent sentences under the assumption that the `loc` option could be passed. In the 3rd ed., the `loc` option is not passed to `legend` in the code block 50, so the explanation of the `loc` option seems abrupt.

Note from the Author or Editor:
I revised the text in this section

Noritada Kobayashi  Jan 07, 2023 
Page "Exponentially Weighted Functions" in 11.7
3rd paragraph in p.400

Error:
with an exponentially weighted (EW) moving average with `span=60`

Correct:
with an exponentially weighted (EW) moving average with `span=30`

Reason:
The code states `span=30` and also the 1st paragraph describes that specifying with `span` makes the result comparable to a simple rolling with the same width.

Note from the Author or Editor:
I'm fixing this in the text

Noritada Kobayashi  Mar 05, 2023 
Safari Books Online
chapter 2
Chapter 2

can vs cann


Python Language Basics, IPython, and Jupyter Notebooks

Built-in Data Structures, Functions, and Files

"To check if two variables refer to the same object, use the is keyword. is not cann analogously be used to check that two objects are not the same:"

Note from the Author or Editor:
Corrected before publication. Thank you!

Anonymous  Dec 13, 2021  Aug 12, 2022
Other Digital Version
§2.3
Language Semantics\Binary operators and comparisons

"Python Language Basics, IPython, and Jupyter Notebooks
...
Language Semantics
...
Binary operators and comparisons
Most of the binary math operations and comparisons use familiar mathematical syntax used in other programming langauges:"

"languages" instead of "langauages"

Note from the Author or Editor:
Corrected before publication. Thank you!

Oussama Kiassi  Jan 12, 2022  Aug 12, 2022
Other Digital Version
1.2 Why Python for Data Analysis?, Solving the “Two-Language” Problem
Second paragraph

The first sentence of the paragraph lacks a verb:

"Over the last decade some new approaches to solving the "two-language" problem, such as the Julia programming language."

Note from the Author or Editor:
Corrected before publication. Thank you!

Ali Rahmjoo  Feb 15, 2022  Aug 12, 2022
Page 7.5 Categorical Data
page 391

The input in [248] gives an error.

Here is the correct input:

%time
labels.astype('category')

Note from the Author or Editor:
fixing the code example

Marjorie Curry  Oct 30, 2022 
Page 11.6 Resampling and Frequency Conversion
Table 11-5

Expression:
Axis to resample on; default `axis=0`

Suggestion for improvements:
Axis to resample on; default `axis="index"`

Reason:
This is not a mistake, but since the 3rd edition seems to unify the specification of axis in pandas with `"index"` and `"columns"` instead of numbers, the specification with numbers may surprise the reader a little.

Note from the Author or Editor:
I am fixing in text

Noritada Kobayashi  Mar 03, 2023 
Page 11.6 Resampling and Frequency Conversion
Table 11-5

Error:
`fill_method` How to interpolate when upsampling, as in `"ffill"` or `"bfill"`; by default does no interpolation

Correct:
(deletion of description)

Reason:
This option has been removed from API in pandas v0.18.0. See doc/source/whatsnew/v0.18.0.rst in the pandas repository.

Note from the Author or Editor:
Removing from text

Noritada Kobayashi  Mar 04, 2023 
Page 11.6 Resampling and Frequency Conversion
Table 11-5

Error:
`limit` When forward or backward filling, the maximum number of periods to fill

Correct:
(deletion of description)

Reason:
This option has been removed from API in pandas v0.18.0. See doc/source/whatsnew/v0.18.0.rst in the pandas repository.

Note from the Author or Editor:
Removing in text

Noritada Kobayashi  Mar 04, 2023 
Page 11.7 Moving Window Functions
1st paragraph in p.399

Expression:
The `rolling` function also accepts a string indicating a fixed-size time offset rolling() in moving window functions rather than a set number of periods.

Reason:
The meaning of "rolling() in moving window functions", which are inserted in the 3rd edition, seemed to me to be difficult to understand. In the 2nd edition, the sentence corresponding to this sentence was as follows:

The `rolling` function also accepts a string indicating a fixed-size time offset rather than a set number of periods.

Note from the Author or Editor:
This "rolling() in moving window functions" piece was inserted in the text by the indexer in error. It can either be removed or converted into its proper indexterm form

Noritada Kobayashi  Mar 05, 2023 
Page 88
Table 4-1, third entry, 'arange'

the Python built-in range() function does not return a list but a generator

Note from the Author or Editor:
fixing

Claas Rostock  Dec 26, 2022 
Page 104
Table 4-3

uniform appears two times in the table

Note from the Author or Editor:
fixing

Claas Rostock  Dec 26, 2022 
Page 133
first paragraph and code block

"If you assign a Series, its labels will be realigned exactly to the DataFrame's index ..."

In[65]: val = pd.Series([-1.2, -1.5, -1.7], index=["two", "four", "five"])"

This does not demonstrate any matching of frame2's index to the Series index.
It would be more informative as something like '... index=["two", 4, "five"]'

Note from the Author or Editor:
I am fixing the code example

Gregory Sherman  Feb 21, 2023 
Page 176
mid

You write:
"
Indexing
Can treat one or more columns as the returned DataFrame..
"
Is this correct, or did you mean "treat .. as index of the returned DataFrame"?

Note from the Author or Editor:
fixing

Claas Rostock  Dec 28, 2022 
Page 207
paragraph at top & [38]

"Suppose you want to keep only rows containing at most a certain number of missing observations. You can indicate this with the thresh argument."

In fact, as command[38] shows, with thresh=2, only rows with <2 missing values were kept.
In the first sentence of the page, "at most" can be replaced with "less than".

Note from the Author or Editor:
I am correcting to "less than" in the text

Gregory Sherman  Mar 03, 2023 
Page 274 (third edition)
In [151]:

The line "In [151]: ..." appears to be superfluous---a holdover from the second edition.

Note from the Author or Editor:
will fix

Michael VanValkenburgh  Nov 29, 2022 
Page 301 (third edition)
Table 9-4

In Table 9-4, I believe the argument is "layout" (singular).

Note from the Author or Editor:
will fix

Michael VanValkenburgh  Nov 29, 2022 
Page 317
first sentence in 9.3

"there [are] many options..." (insert "are")

Note from the Author or Editor:
will fix

Michael VanValkenburgh  Nov 30, 2022