Errata

Errata for Python for Data Analysis, Third Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
	Page 4.4 Array-Oriented Programming with Arrays 1st code block	In [169]: points = np.arange(-5, 5, 0.01) # 100 equally spaced points -> this will return 1000 equally spaced points, not 100	Anonymous	Jan 14, 2024
	Page Section 4.1 - data types for ndarrays second note	In the note it says "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers. For example, int8 (signed 8-bit integer) can represent integers from -128 to 127 (inclusive), while uint8 (unsigned 8-bit integer) can represent 0 through 255." Second part of the first sentence seems incorrect (nonzero integers) It should most likely read ", while an unsigned integer can only represent non-negative integers." The example makes that clear also. Note from the Author or Editor: confirmed, should be "non-negative"	Niclas Ericsson	Nov 28, 2023
	Page https://wesmckinney.com/book/python-builtin#control_exceptions at the first mention of the "finally:" block	The write_to_file() function is not defined. Note from the Author or Editor: write_to_file is a fake function for illustration's sake, but I'll clarify anyway	Sandor Budai	Nov 14, 2023
	Page Chapter 10: Data Aggregation and Group Operations Quantile and Bucket Analysis Section	In line "pandas has some tools, in particular pandas.cut and pandas.qcut", the referred section is incorrect. Incorrect referred section: "Ch 8: Data Wrangling: Join, Combine, and Reshape, " Correct referred section: "Ch7: Data Cleaning and Preparation" Note from the Author or Editor: i will fix the reference	Thinh Pham	Nov 12, 2023
	Page Creating ndarrays n/a	In the two examples for data type for the array that NumPy creates: In [27]: arr1.dtype Out[27]: dtype('float64') In [28]: arr2.dtype Out[28]: dtype('int64') The output of the dtype for arr2 is not int64 but int32. Note from the Author or Editor: I will add a note that the output might be int32 on some platforms	Jaeeun Choi	Nov 11, 2023
	Page Page 112 2nd Paragraph	Wes, I hope your're doing well bro. Enjoying the paperback of edition 3! This is a minor (possible negligible) language clarification. paragraph 2 reads: [Here, arr.mean(axis=1) means "compute mean across the columns," where arr.sum(axis=0) means "compute sum down the rows"]. The choice of wording here is a bit confusing and could potential be interpreted to mean the opposite of what it is saying. May I suggest, [Here, arr.mean(axis=1) means "compute mean through the rows," where arr.sum(axis=0) means "compute sum through the columns"]. Note from the Author or Editor: i will revise the language to use "over"	Daniel Gala	Nov 04, 2023
	Page 13.1 Bitly Data from 1.USA.gov use the json module and its loads function invoked on each line in the sample file we downloaded	"import json with open(path) as f: records = [json.loads(line) for line in f]" , but It cann't use loads function invoked on each line in the sample file, Ipython/jupyter pop up a error :"UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 6991: illegal multibyte sequence" Note from the Author or Editor: We need to add encoding="utf-8" when opening the file because this fails in china	Sam Z.H.	Oct 30, 2023
	Page 29 1st paragraph	'If you bind a new object to a variable inside a function, that will not overwrite a variable of the same name in the "scope" outside the function (the "parent scope").' I believe that the correct is "... that will overwrite a variable ..." as it is demonstrated in the given example below the paragraph. Note from the Author or Editor: the language is unclear, I will revise	John Maciel	Oct 03, 2023
	Page Chapter 5, Indexing Selection and filtering, Selecting on dataframe with loc and iloc 2nd paragraph	The result of selecting a single row is a Series with an index that contains the DataFrame's column labels. To select multiple roles, creating a new DataFrame, pass a sequence of labels: To select multiple rows instead of To select multiple roles Note from the Author or Editor: confirmed	Elombat Loic	Sep 05, 2023
	Page Appendices- Advanced Numpy, A3 Broadcasting P 667, 'demean_axis' function code	the last line of function definition of 'demean_axis' should be changed to 'return arr - means[tuple(indexer)]', from 'return arr - means[indexer]'. Note from the Author or Editor: will fix	Lance Lee	Sep 04, 2023
	Page 310 last paragraph	In the text, you say histplot can plot both histogram and density plot simultaneously, but then (in Figure 9-23) you only plot the histogram. I wonder if you intended to use kde=True so that both are plotted. Note from the Author or Editor: You're right, I will fix	Alex Dow	Aug 24, 2023
	Page Chapter 2, Variables and argument passing section 3rd paragraph under the section	"In some languages, the assignment if b will cause the data [1, 2, 3] to be copied." if -> of Note from the Author or Editor: confirmed	Jeremy Hageman	Aug 23, 2023
	Page 11.7 Moving Window Functions 1st paragraph in p.399	Expression: The `rolling` function also accepts a string indicating a fixed-size time offset rolling() in moving window functions rather than a set number of periods. Reason: The meaning of "rolling() in moving window functions", which are inserted in the 3rd edition, seemed to me to be difficult to understand. In the 2nd edition, the sentence corresponding to this sentence was as follows: The `rolling` function also accepts a string indicating a fixed-size time offset rather than a set number of periods. Note from the Author or Editor: This "rolling() in moving window functions" piece was inserted in the text by the indexer in error. It can either be removed or converted into its proper indexterm form	Noritada Kobayashi	Mar 05, 2023
	Page "Exponentially Weighted Functions" in 11.7 3rd paragraph in p.400	Error: with an exponentially weighted (EW) moving average with `span=60` Correct: with an exponentially weighted (EW) moving average with `span=30` Reason: The code states `span=30` and also the 1st paragraph describes that specifying with `span` makes the result comparable to a simple rolling with the same width. Note from the Author or Editor: I'm fixing this in the text	Noritada Kobayashi	Mar 05, 2023
	Page 11.6 Resampling and Frequency Conversion Table 11-5	Error: `limit` When forward or backward filling, the maximum number of periods to fill Correct: (deletion of description) Reason: This option has been removed from API in pandas v0.18.0. See doc/source/whatsnew/v0.18.0.rst in the pandas repository. Note from the Author or Editor: Removing in text	Noritada Kobayashi	Mar 04, 2023
	Page 11.6 Resampling and Frequency Conversion Table 11-5	Error: `fill_method` How to interpolate when upsampling, as in `"ffill"` or `"bfill"`; by default does no interpolation Correct: (deletion of description) Reason: This option has been removed from API in pandas v0.18.0. See doc/source/whatsnew/v0.18.0.rst in the pandas repository. Note from the Author or Editor: Removing from text	Noritada Kobayashi	Mar 04, 2023
	Page 11.6 Resampling and Frequency Conversion Table 11-5	Expression: Axis to resample on; default `axis=0` Suggestion for improvements: Axis to resample on; default `axis="index"` Reason: This is not a mistake, but since the 3rd edition seems to unify the specification of axis in pandas with `"index"` and `"columns"` instead of numbers, the specification with numbers may surprise the reader a little. Note from the Author or Editor: I am fixing in text	Noritada Kobayashi	Mar 03, 2023
	Page 207 paragraph at top & [38]	"Suppose you want to keep only rows containing at most a certain number of missing observations. You can indicate this with the thresh argument." In fact, as command[38] shows, with thresh=2, only rows with <2 missing values were kept. In the first sentence of the page, "at most" can be replaced with "less than". Note from the Author or Editor: I am correcting to "less than" in the text	Gregory Sherman	Mar 03, 2023
	Page 133 first paragraph and code block	"If you assign a Series, its labels will be realigned exactly to the DataFrame's index ..." In[65]: val = pd.Series([-1.2, -1.5, -1.7], index=["two", "four", "five"])" This does not demonstrate any matching of frame2's index to the Series index. It would be more informative as something like '... index=["two", 4, "five"]' Note from the Author or Editor: I am fixing the code example	Gregory Sherman	Feb 21, 2023
	Page "Quantile and Bucket Analysis" in 10.3 paragraph spanning p. 339 and p. 340	Error: We can pass `4` as the number of bucket compute sample quartiles, and pass `labels=False` to obtain just the quartile indices instead of intervals: Suggestion for improvements: We can pass `4` as the number of bucket to compute sample quartiles, and pass `labels=False` to obtain just the quartile indices instead of intervals: Reason: "to" may be missing. Note from the Author or Editor: I am adding the missing "to"	Noritada Kobayashi	Jan 14, 2023
	Page "Saving Plots to File" in 9.1 Table 9-2	Error: `facecolor, edgecolor` The color of the figure background outside of the subplots; `"w"` (white), by default. Correct: The color of the figure background outside of the subplots; default to `rcParams["savefig.facecolor"]` and `rcParams["savefig.edgecolor"]`, both of which default to `"auto"` (facecolor and edgecolor of the current figure). Reason: The default changed from matplotlib 3.3. Note from the Author or Editor: I'm removing the part about the default altogether since it's pretty in the weeds	Noritada Kobayashi	Jan 07, 2023
	Page "Saving Plots to File" in 9.1 1st paragraph	Error: You can save the active figure to file using the figure object’s savefig instance method. Correct: You can save the figure to file using the figure object’s savefig instance method. Reason: In the 2nd ed., the target of the operation was an active figure since the section described `plt.savefig`, but in the 3rd ed., since the `savefig` instance method of a figure object is described, I think the target of the operation does not need to be active. Note from the Author or Editor: I am removing "active" from the text	Noritada Kobayashi	Jan 07, 2023
	Page "Adding legends" in "Ticks, Labels, and Legends" in 9.1 Blocks above and below Figure 9-10	Error: In [50]: ax.legend() The `legend` method has several other choices for the location `loc` argument. See the docstring (with `ax.legend?`) for more information. The `loc` legend option tells matplotlib where to place the plot. The default is `"best"`, which tries to choose a location that is most out of the way. To exclude one or more elements from the legend, pass no label or `label="_nolegend_"`. Correct: In [50]: ax.legend() The `legend` method can take the `loc` option, which instructs matplotlib where to place the legend in the plot. The `loc` option defaults to `"best"`, which tries to choose a location that is most out of the way. To exclude one or more elements from the legend, pass no label or `label="_nolegend_"`. The `legend` method has several other choices for the `loc` argument. See the docstring (with `ax.legend?`) for more information. Reason: In the 2nd ed., the author passed `loc="best"` as the argument to `legend` in the code block 50, so readers could read the subsequent sentences under the assumption that the `loc` option could be passed. In the 3rd ed., the `loc` option is not passed to `legend` in the code block 50, so the explanation of the `loc` option seems abrupt. Note from the Author or Editor: I revised the text in this section	Noritada Kobayashi	Jan 07, 2023
	Page 176 mid	You write: " Indexing Can treat one or more columns as the returned DataFrame.. " Is this correct, or did you mean "treat .. as index of the returned DataFrame"? Note from the Author or Editor: fixing	Claas Rostock	Dec 28, 2022
	Page 104 Table 4-3	uniform appears two times in the table Note from the Author or Editor: fixing	Claas Rostock	Dec 26, 2022
	Page 88 Table 4-1, third entry, 'arange'	the Python built-in range() function does not return a list but a generator Note from the Author or Editor: fixing	Claas Rostock	Dec 26, 2022
	Page 317 first sentence in 9.3	"there [are] many options..." (insert "are") Note from the Author or Editor: will fix	Michael VanValkenburgh	Nov 30, 2022
	Page 282 (third edition) second sentence of 9.1	Will you please clarify the difference between %matplotlib inline and %matplotlib notebook ? For example, Figure 9-15 on page 302 works with notebook but is blank with inline, and Figure 9-19 on page 307 works with inline but partially overwrites Figure 9-18 with notebook. Note from the Author or Editor: I will clarify	Michael VanValkenburgh	Nov 30, 2022
	Page 301 (third edition) Table 9-4	In Table 9-4, I believe the argument is "layout" (singular). Note from the Author or Editor: will fix	Michael VanValkenburgh	Nov 29, 2022
	Page 274 (third edition) In [151]:	The line "In [151]: ..." appears to be superfluous---a holdover from the second edition. Note from the Author or Editor: will fix	Michael VanValkenburgh	Nov 29, 2022
	Page Table 6-2, page 257 Argument: skip_footer	In pandas.read_csv(), the argument "skip_footer" has been deprecated. It's now "skipfooter". Note from the Author or Editor: will fix	Anonymous	Nov 26, 2022
	Page 166 (3rd edition) middle of page	It is not true that "if any value is not NA, then the result is NA." Apparently the default is to skip (exclude) NA values. Note from the Author or Editor: Yes, the language needs to be fixed to indicate that the result will be the sum of the non-NA values	Michael VanValkenburgh	Nov 15, 2022
	Page "String Functions in pandas" in 7.4 Table 7-6	Error: Equivalent to built-in str.alnum Correct: Equivalent to built-in str.isalnum Reason: See the online documentation of Python. Note from the Author or Editor: will fix	Noritada Kobayashi	Nov 12, 2022
	Page "Regular Expressions" in 7.4 1st paragraph in p.231	Original: the match object can only tell us the start and end position of the pattern in the string: Suggestion for improvement: the match object can tell us the start and end position of the pattern in the string: Reason: As the code block that follows indicates, the string representation of the match object includes information about the matched substring in addition to the start and end positions: Out[174]: <re.Match object; span=(5, 20), match='dave@google.com'> Note from the Author or Editor: will fix	Noritada Kobayashi	Nov 06, 2022
	Page 7.5 Categorical Data page 391	The input in [248] gives an error. Here is the correct input: %time labels.astype('category') Note from the Author or Editor: fixing the code example	Marjorie Curry	Oct 30, 2022
	Page "A.6 More About Sorting" in Appendix A 2nd code block (program list)	The randomly generated array (as below) is inappropriate as an example, as the first column is in ascending order from the beginning. Therefore, although we want only the first column to be sorted, there is no change in the array before and after sorting, which makes it difficult to convey the intent. It would be an appropriate example if it were generated with other parameters. In [166]: arr = rng.standard_normal((3, 5)) In [167]: arr Out[167]: array([[-1.1956, 0.4691, -0.3598, 1.0359, 0.2267], [-0.7448, -0.5931, -1.055 , -0.0683, 0.458 ], [-0.07 , 0.1462, -0.9944, 1.1436, 0.5026]]) In [168]: arr[:, 0].sort() # Sort first column values in place In [169]: arr Out[169]: array([[-1.1956, 0.4691, -0.3598, 1.0359, 0.2267], [-0.7448, -0.5931, -1.055 , -0.0683, 0.458 ], [-0.07 , 0.1462, -0.9944, 1.1436, 0.5026]]) Note from the Author or Editor: I will improve the example to be more robust to random number generation	Noritada Kobayashi	Oct 30, 2022
	Page "Reading Text Files in Pieces" in 6.1 4th paragraph	"The elipsis marks" should be "The ellipsis marks". Note from the Author or Editor: will fix	Noritada Kobayashi	Oct 10, 2022
	Page Ch 10, Data Aggregation and Group Operations 10.3 Quantile and Bucket Analysis	Error: As you may recall from Ch 8: Data Wrangling: Join, Combine, and Reshape, pandas has some tools, in particular pandas.cut and pandas.qcut, for slicing data up into buckets with bins of your choosing, or by sample quantiles. Correct: As you may recall from Ch 7: Data Cleaning and Preparation, pandas has some tools, in particular pandas.cut and pandas.qcut, for slicing data up into buckets with bins of your choosing, or by sample quantiles. Reason: pandas.cut and pandas.qcut are discussed in Ch 7 Section 2, Discretization and Binning. Note from the Author or Editor: will fix	Young Tan	Sep 26, 2022
	Page Preface first paragraph	In the first sentence of the preface here: wesmckinney.com/book/preface.html it says: "This Open Access web version of Python for Data Analysis 3rd Edition is now available as a companion to the print and digital editions. If you encounter any errata, please report them here." The URL for error reporting is: www.oreilly.com/catalog/errata.csp?isbn=0636920023784 that is the wrong URL. The correct URL is: oreilly.com/catalog/0636920519829/errata Note from the Author or Editor: will fix	Anonymous	Sep 25, 2022
	Page Preface - Acknowledgments Acknowledgments for Third Edition (2022)	Minor typo: "Programmer" is mispelled. It has more than a decade since I started writing the first edition of this book and more than 15 years since I originally started my journey as a Python prorammer. Note from the Author or Editor: I am fixing the typo	Andy Jessen	Sep 16, 2022
Other Digital Version	1.2 Why Python for Data Analysis?, Solving the “Two-Language” Problem Second paragraph	The first sentence of the paragraph lacks a verb: "Over the last decade some new approaches to solving the "two-language" problem, such as the Julia programming language." Note from the Author or Editor: Corrected before publication. Thank you!	Ali Rahmjoo	Feb 15, 2022	Aug 12, 2022
Other Digital Version	§2.3 Language Semantics\Binary operators and comparisons	"Python Language Basics, IPython, and Jupyter Notebooks ... Language Semantics ... Binary operators and comparisons Most of the binary math operations and comparisons use familiar mathematical syntax used in other programming langauges:" "languages" instead of "langauages" Note from the Author or Editor: Corrected before publication. Thank you!	Oussama Kiassi	Jan 12, 2022	Aug 12, 2022
	chapter 2 Chapter 2	can vs cann Python Language Basics, IPython, and Jupyter Notebooks Built-in Data Structures, Functions, and Files "To check if two variables refer to the same object, use the is keyword. is not cann analogously be used to check that two objects are not the same:" Note from the Author or Editor: Corrected before publication. Thank you!	Anonymous	Dec 13, 2021	Aug 12, 2022