Python for Data Analysis

Errata for Python for Data Analysis, Third Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Other Digital Version Preface
Using Code Examples

Words wrong way around on

You can data find files

should be

You can find data files

Steven Mooney  Feb 16, 2024 
Printed, ePub Page Section 3.1, page 59
1st paragraph

The example:
In [118]: hash("string")
Out [118]: 3634226001988967898

However, when I did it I got inconsistent results from the hash function.
below are examples of the result from running the function 4 consecutive times:

Thus this function could not be used to verify the object "string" could be used as a dictionary key.

I am using an 2021 iMac with an Apple M1 chip, 16 GB memory, and macOS Sonoma 14.2.1

I am using PyCharm 2023.3.3 (Community Edition)
Build #PC-233.13763.11, built on January 25, 2024
Runtime version: 17.0.9+7-b1087.11 aarch64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o.
macOS 14.2.1
GC: G1 Young Generation, G1 Old Generation
Memory: 2048M
Cores: 8
Metal Rendering is ON
Non-Bundled Plugins: (2024.1-2023.3-882)

Patrick Salkeld  Feb 16, 2024 
Other Digital Version Section 5.2; Indexing, Selection, and Filtering
Selection on DataFrame with loc and iloc

The word rows is misspelled as "roles".

The result of selecting a single row is a Series with an index that contains the DataFrame's column labels. To select multiple roles, creating a new DataFrame, pass a sequence of labels:

Andrei  Feb 17, 2024 
Other Digital Version Generator expressions
3rd code listing

syntax typo for the statement `dict((i, i **2) for i inrange(5))`
should have a space between the keywords `in` and `range`.

Ben To  Feb 19, 2024 
Other Digital Version Set
hashable set elements part

just missing a space before the **first parenthesis** in the sentence "set elements generally must be immutable, and they must be hashable(which means that calling hash on a value does not raise an exception)."

Ben To  Feb 19, 2024 
Printed, ePub Page Page 98, Section 4.1
First example, first 3 paragraphs

When tried to duplicate this example:
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
data = ([[4, 7], [0,2], [-5, 6], [0, 0],[1, 2], [-12, -4], [3, 4]])
names == "Bob"
data[names == "Bob"]

I got this error:
Traceback (most recent call last):
File "/Volumes/Extreme SSD/Python Data Analysis/Python3_for_Data_Analysis/", line 550, in <module>
data[names == "Bob"]
TypeError: only integer scalar arrays can be converted to a scalar index

This contradicts the subsequent text which states:
"...You can even mix and match Boolean arrays with slices or integers (or sequences of integers; more on this later)."

Patrick Salkeld  Feb 19, 2024 
Other Digital Version Chapter 4 - Data Types for ndarrays
Second note box

Where the online text says "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers", the phrase "nonzero integers" should be "non-negative integers".

Ben To  Mar 04, 2024 
O'Reilly learning platform Page Chapter 10.x
Throughout the chapter

Chapter 10 uses DataFrame.groupby(...,axis="columns") on several occasions, which is deprecated.

Jochen Schüttler  Apr 09, 2024 
Other Digital Version Chapter 4, Section "Data Types for ndarrays"
The second Note (after Table 4.2)

"A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers."

"A signed integer can represent both positive and negative integers, while an unsigned integer can only represent non-negative integers, including zero."

Alessandro Botelho Bovo  Jun 06, 2024 
Other Digital Version Chapter 2, section "Numeric types"
3rd paragraph

It says:
"Integer division not resulting in a whole number will always yield a floating-point number"

"Integer division will always yield a floating-point number"

Alessandro Botelho Bovo  Jun 06, 2024 
Other Digital Version Chapter 4, Section "Unique and Other Set Logic"
1st paragraph

It says: "NumPy has some basic set operations for one-dimensional ndarrays. A commonly used one is numpy.unique, which returns the sorted unique values in an array:"

The sentence might imply that `numpy.unique` only works for one-dimensional arrays, which is not true. The `numpy.unique` function also works for n-dimensional arrays, although by default it flattens the array to one dimension before finding the unique values.

Alessandro Botelho Bovo  Jun 11, 2024 
ePub Page Chapter 3, List
Discussion regarding "Extend"

Document at

In the discussion of "Extend", the text compares extend to "+" with adding a multi-element list in _one_ move to another multi-element list.

However, when discussing performance, the text describes adding the multi-element list in _n_ moves where _n_ is the length of the list being added, using a for loop. There seems to be little point to using either "extend" or "+" to add one element at a time to a list. One might as well use "append", it would make the code easier to understand.

Steven O. Ellis  Jul 07, 2024 
Other Digital Version 1.4 Installation and Setup
Installing Necessary Packages

On Windows, substitute a carat ^ for the line continuation \ used on Linux and macOS.
"carat" should be "caret", right?

Anonymous  May 15, 2024 
ePub Page 3.1, List
Discussion of "Extend"

Please disregard the errata I just submitted. I missed that the example was a list of lists. The text makes perfect sense.

Steven O. Ellis  Jul 07, 2024 
Other Digital Version 4.2 Pseudorandom Number Generation
Table 4.3: NumPy random number generator methods

duplicate `uniform` function listed in the table

Ben To  Mar 09, 2024 
Other Digital Version 4.4 Array-Oriented Programming with Arrays
first code listing

In [169]: points = np.arange(-5, 5, 0.01) # 100 equally spaced points

But this results in "1000" points.

Ben To  Mar 11, 2024 
ePub Page 5 Indexing, Selection and Filtering
Using Code Examples

In the following sentence should 'columns' be changed to 'rows'. When I test this, it prints 2 rows and all the columns.

The row selection syntax data[:2] is provided as a convenience. Passing a single element or a list to the [] operator selects columns.

Steven Mooney  Feb 21, 2024 
ePub Page 7.1.1 Filtering Out Missing Data
6th Paragragh and [38]

"Suppose you want to keep only rows containing at most a certain number of missing observations. You can indicate this with the thresh argument:"

The thresh argument to numpy.Dataframe.dropna() does not govern how many NA values are allowed.
Instead it requires that many non-NA values to be present.

Anonymous  May 07, 2024 
O'Reilly learning platform Page 10.2
6th code box, In [72]

The code example is "grouped_pct.agg([("average", "mean"), ("stdev", np.std)])". There is a FutureWarning to use "grouped_pct.agg([("average", "mean"), ("stdev", "std")]) instead.

Jochen Schüttler  Apr 09, 2024 
Other Digital Version 13.3 US Baby Names In[116]
China edition page415

According to the up code block: def~~
In[116]: names Out[116]: table maybe wrong.
It should be
name sex births year prop
year sex
1880 F 0 Mary F 7065 1880 0.077643
1 Anna F 2604 1880 0.028618
2 Emma F 2003 1880 0.022013
3 Elizabeth F 1939 1880 0.021309
4 Minnie F 1746 1880 0.019188
... ... ... ... ... ... ... ...
2010 M 1690779 Zymaire M 5 2010 0.000003
1690780 Zyonne M 5 2010 0.000003
1690781 Zyquarius M 5 2010 0.000003
1690782 Zyran M 5 2010 0.000003
1690783 Zzyzx M 5 2010 0.000003

Zhang yingtan  Mar 19, 2024 
PDF Page 135
4 & 6

"If a DataFrame’s index and columns have their name attributes set, these will also be displayed:"

Next sentence says: "Unlike Series, DataFrame does not have a name attribute."

One sentence (par. 4) refers to df as having their name attributes "set", while in the next sentence it specifies the df's "does NOT have a name attribute"

This creates confusion.

Emile Jacques Bosman  May 01, 2024 
Printed Page 169

In[283] and In[285] look exactly the same even though line above says that you could include more concise syntax.

Jude Cancellieri  Mar 09, 2024