Errata

Errata for Python for Data Analysis

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
Printed	Page vi United States	The technical editor Hugh Brown is listed as Hugh White. Not sure of the page number. Note from the Author or Editor: Yes, many apologies. His name is Hugh Brown (and he was a great editor!)	Hugh Brown	Nov 05, 2012	May 17, 2013
	Page New 3/E textbook > Chapter 2 > Variables and argument passing 3rd paragraph	wesmckinney.com/book/python-basics.html#semantics_references New 3/E textbook > Chapter 2 > Variables and argument passing This: "In some languages, the assignment if b will cause the data..." should probably be: "In some languages, the assignment of b will cause the data..." Changed 'if' to 'of'. Note from the Author or Editor: will fix	Aaditya Bugga	Apr 09, 2022
	n/a	In the open access version, when seaborn histplots are plotted, the kde=true argument seems to be missing: wesmckinney.com/book/plotting-and-visualization.html#fig-vis_series_kde Note from the Author or Editor: will fix	Hamed	Apr 22, 2022
	Page chapter 9 first paragraph	In section Ticks, Labels and Legends. ax.xlim() is no longer working, they changed to ax.set_xlim() Note from the Author or Editor: will fix	Levy	May 26, 2022
	Page NA 4.4 Array-Oriented Programming with Arrays	wesmckinney.com/book/numpy-basics.html Current: In [169]: points = np.arange(-5, 5, 0.01) # 100 equally spaced points Proposed fix: In [169]: points = np.arange(-5, 5, 0.01) # 1000 equally spaced points Note from the Author or Editor: will fix	Matt Dahlman	Jun 07, 2022
	Page https://wesmckinney.com/book/preliminaries.html section 1.4 Installation and Setup	Under section 1.4 Installation and Setup, you have the following subheadings: Miniconda on Windows GNU/Linux Miniconda on macOS I think that middle one should be "Miniconda on GNU/Linux" for consistency with the other two. Note from the Author or Editor: will fix	Graeme Richardson	Jun 07, 2022
	Page https://wesmckinney.com/book/pandas-basics.html#pandas_summarize section 5.3 Summarizing and Computing Descriptive Statistics	This statement seems to be incorrect or at least unclear: "When an entire row or column contains all NA values, the sum is 0, whereas if any value is not NA then the result is NA." As seen in the examples, if any value is not NA, the result is a sum. Note from the Author or Editor: Confirmed, I am fixing the language to be correct	Graeme Richardson	Jun 23, 2022
	Page https://wesmckinney.com/book/data-cleaning.html#prep_dummy_vars section on Computing Indicator/Dummy Variables	You have the following note: "For much larger data, this method of constructing indicator variables with multiple membership is not especially speedy. It would be better to write a lower-level function that writes directly to a NumPy array, and then wrap the result in a DataFrame." What is meant by lower-level? A custom C function or a Python function that is more efficient in some way? Given that pandas is presumably written in C, it's surprising that any type of Python function could be faster than pandas. I think this note could be clarified for the reader by saying "not especially speedy because...". For example, is it because using pandas in this way will do too many memory allocations, too many data copies, etc. Note from the Author or Editor: i'm removing this note	Graeme Richardson	Jun 28, 2022
	Page Merging on Index 3rd block of code in the section	There is a format error and below pd.DataFrame({"event1": pd.Series([0, 2, 4, 6, 8, 10], dtype="Int64", everything is displayed in red color within the block of code. This makes the reading confusing Note from the Author or Editor: will reformat	Enrique M. Muro	Jun 30, 2022
	Page Section 5.2 Selection on DataFrame with loc and iloc	It reads “To select multiple roles“ and it should read “To select multiple rows”. Note from the Author or Editor: will fix	Andres Medaglia	Jul 01, 2022
	Page Section 8.3 Reshaping and Pivoting Pivoting “Long” to “Wide” Format	It reads: "Now, ldata looks like:" I believe it should read: "Now, long_data looks like:" Note from the Author or Editor: will fix	Andres Medaglia	Jul 07, 2022
	Page Selection on DataFrame with loc and iloc Right before Table 5.4 3rd edition online	Note: Resubmitted for clarity "Boolean arrays can used with loc but not iloc:" As per the documentation, this is not 100% accurate. A boolean ndarray may be passed for iloc: Given: mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'a': 100, 'b': 200, 'c': 300, 'd': 400}, {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }] df = pd.DataFrame(mydict) df a b c d 0 1 2 3 4 1 100 200 300 400 2 1000 2000 3000 4000 The following produces valid output-------------------------------- df.iloc[:, df.columns.isin(['a','b'])] However, the following does not------------------------------------ data.iloc[data['c']>=300] Note from the Author or Editor: I am clarifying that I mean for selecting rows	Mauricio Ruiz	Jul 14, 2022
	Page Python Language Basics > Scalar Types > Strings https://wesmckinney.com/book/python-basics.html#scalar_strings	A mention of an additional line break is missing in the following text: "It may surprise you that this string c actually contains four lines of text; the line breaks after """ and after lines are included in the string. We can count the new line characters with the count method on c:" There are 3 line breaks that should be mentioned: 1) after """ 2) after `that` 3) after `lines` Line breaks 1 and 3 are correctly mentioned, but 2 was omitted and should be included.	Anonymous	Jul 19, 2022
	Page Ch 4: Fancy Indexing The last paragraph before the 5th code chunk	Many users (myself included) may have expected fancy indexing to return a rectangular sub-matrix. Here is one way to get that: Note from the Author or Editor: will reword	Nicholas Vence	Jul 22, 2022
	Page Tab Completion Paragraph 2	Digital version of the book at wesmckinney.com/book/ipython.html. Word "also" repeated twice: "Also, you can also complete methods and attributes on any object after typing a period:" Note from the Author or Editor: will fix	Semyon Bokhankevich	Jul 25, 2022
	Page Data Types for ndarrays General Note	Digital version of the book at wesmckinney.com/book/numpy-basics.html. In the general note, the wording goes as "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers." Given the code example provided within the note, I think "nonnegative" was meant. An attempt to pass a sequence with a negative number while specifying unsigned integer data type yields a peculiar result: In [35]: np.array([-1, 0, 1], dtype="u1") Out[35]: array([255, 0, 1], dtype=uint8) Thank You for elaborating on this distinction! Note from the Author or Editor: will fix	Semyon Bokhankevich	Jul 27, 2022
	Page Chapter 2 - dates and times https://wesmckinney.com/book/python-basics.html#scalar_dates	Question marks not replaced with actual reference: "See ??? for a full list of format specifications." Note from the Author or Editor: will fix, this is only on the HTML version	Anonymous	Jul 29, 2022
	Page Filling In Missing Data 3rd paragraph	The ??? is not replaced with the actual reference link. 'The same interpolation methods available for reindexing (see ???) can be used with fillna' Note from the Author or Editor: will fix	Junwei Fang	Aug 11, 2022
	Page Acknowledgements for the Third Edition 1st two lines	Text reads: "It has more than a decade since I started writing the first edition of this book and more than 15 years since I originally started my journey as a Python prorammer." Programmer is missing the 'g' and 'It has more...' should probably read 'It has BEEN more...' Note from the Author or Editor: will fix	Laure Robinson	Aug 29, 2022
	Page https://wesmckinney.com/book/python-builtin.html#comprehensions https://wesmckinney.com/book/python-builtin.html	"we could filter out strings with length 2 or less and convert them to uppercase like this:" does not tie to the code following thereafter: [x.upper() for x in strings if len(x) > 2] ['BAT', 'CAR', 'DOVE', 'PYTHON'] Change to e.g. "filter out strings with length more than 2 and convert them..." Note from the Author or Editor: will fix	Thomas Pfeiffer	Sep 08, 2022
	Page https://wesmckinney.com/book/python-builtin.html https://wesmckinney.com/book/images/pda3_0301.png	pda3_0301.png is apparently missing. Note from the Author or Editor: This has been fixed	brian piercy	Sep 17, 2022
	Page Chapter 2, strings Above output #65	Change "Afer this operation, the variable" to "After this operation, the variable" wesmckinney.com/book/python-basics.html Note from the Author or Editor: will fix	Will Beasley	Sep 24, 2022
	Page Table 2.1: Binary operators Table 2.1: Binary operators	a <= b should be in the inline code, or `a < b, a <= b ` . Note from the Author or Editor: will fix	Alen Softić	Jan 11, 2023
	Page Section 7.3, page 227, Table 7-3 Top	StringDtype is missing from the table. Not sure if the table is meant to be exhaustive, but this is an important type that should be included in this table. Note from the Author or Editor: I will add it in a subsequent printing	Kerrick Staley	Jun 06, 2023
Mobi	Page 1 On Kindle: "Location 325 of 13301"	Sorry, don't know the proper page number (I'm on a kindle), so I entered 1. In Chapter 1, under the numpy description, one of the bullet points has a minor grammatical error. It reads" "Tools for integrating connecting C, C + +, and Fortran code to Python" I assume "integrating connecting" was not intended as is. Note from the Author or Editor: on page 4 of the print text / PDF change "integrating connecting C, C++, ..." to "integrating C, C++, ..."	Anonymous	Oct 24, 2012	May 17, 2013
	Page 1.4 Installation and Setup Installing Necessary Packages	In the part about setting up the enviroment, should change the packeage from jupyter to jupyterlab to avoid some package/dependencies conflits change "(pydata-book) $ conda install -y pandas jupyter matplotlib" to "(pydata-book) $ conda install -y pandas jupyterlab matplotlib" Note from the Author or Editor: agreed, fixing	Luiz Henrique	Dec 06, 2022
	Page 2 Python Language Basics, IPython, and Jupyter Notebooks 1st paragraph	"Now in 2022, there is now[...]" --> One "now" should be removed Note from the Author or Editor: will fix	Anonymous	Nov 30, 2022
Printed, PDF, ePub	Page 6-8 Installation and Setup	Dear Sirs: I have just purchased Wes McKinney�s Python for Data Analysis. I am trying to install Python as instructed on pages 6-8 of the book, but I am running into problems. It appears that the Python package that comes with EDPFree and the Pandas library are both essential for me to use the book. When I try to install Pandas on top of EDPFree (which is now Canopy Express), I get the error message: �Python version 2.7 required, which was not found in the registry.� I am running Windows 7 (32-bit). The author recommends uninstalling the previous version of Python and then installing EPDFree, which has been changed to Enthought Canopy. After I do that, Python does not appear in Add or Remove Programs anymore, but Enthought Canopy does. The Canopy interface works, and it can run a simple script. It says that � contrary to the error message � I do have version 2.7 of Python installed. The author recommends installing pandas-0.9.0.win32-py2.7.exe. Only version 11 is now available, so I downloaded that. When I googled the error message, I got a suggestion to add C:\Python27; and C:\Python27\Scripts; to my system path, but that did not help. Google also gave me a suggestion to uninstall Python (which means Canopy in this case) for all users and re-install for just me. This also did not help. As things now stand, I do not think I will be able to make any use of the book. Is there a forum or an author�s page that addresses this problem? Thank you, John Chesnut Note from the Author or Editor: Since publishing the book Enthought have changed their Python distribution so that the directions are now incompatible. If you run into this problem please install the free Anaconda distribution for your platform (which includes pandas) from here: http://continuum.io/downloads	Anonymous	May 28, 2013	Dec 12, 2014
	Page 7.3 Extension Data Types Table 7.3: pandas extension data types	I've found the error on the html, web version one. The table is linked to wrong table. So when I click on the link button, It gives me a wrong link back. Note from the Author or Editor: fixing in html version	Anonymous	Nov 05, 2022
PDF	Page 9 2nd paragraph	In the OS X installation it states that we should type "gcc" at the terminal command line to see if gcc is installed. I'm running Mavericks and it is not installed. I believe it's been depreciated by Apple. Is there a workaround for this issue? Thanks Note from the Author or Editor: Yes, Mavericks now uses clang instead of gcc. Editors, could you add a parenthesis that states "(or clang on newer versions of OS X)"	scottclausen@mac.com	Oct 23, 2013	Dec 12, 2014
	Page 9 Plotting and Visualization Python for Data Analysis, 3E (pre release)	I use "jupyter notebook" and "jupyterlab". The examples in "Figures and Subplots" cannot be reproduced with jupyterlab exactly as in the book. Suggestion At the beginning of the book it should be pointed out that the examples could be reproduced with "jupyter notebook", minor adjustments would be necessary with jupyterlab. Best regards, Robert Note from the Author or Editor: I'm adding some language for JupyterLab users	Robert Moser	Aug 10, 2022
	Page 9.2 Plotting with pandas and seabord code for figure Code for figure 9-24	The code attempts to set the title of the chart with: In [109]: ax.title("Changes in log(m1) versus log(unemp)") This raises an exception: TypeError: 'Text' object is not callable Perhaps it should be: ax.set(title="Changes in log(m1) versus log(unemp)") Note from the Author or Editor: will fix	Mark Meyer	Jul 13, 2022
	Page 13 Data Analysis Examples "Donation Statistics by Occupation and Employer" Section	In the web version of book: In the below text the "f" in the map method should be "get_emp" def get_emp(x): # If no mapping provided, return x return emp_mapping.get(x, x) fec["contbr_employer"] = fec["contbr_employer"].map(f) Note from the Author or Editor: will fix	Anonymous	Jan 05, 2023
PDF	Page 18 India	the following command [json.loads(line) for line in open(path)] produces the following error: -------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-83-b1e0b494454a> in <module>() ----> 1 records = [json.loads(line) for line in open(path)] C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, kw) 336 parse_int is None and parse_float is None and 337 parse_constant is None and object_pairs_hook is None and not kw): --> 338 return _default_decoder.decode(s) 339 if cls is None: 340 cls = JSONDecoder C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\decoder.pyc in decode(self, s, _w) 363 364 """ --> 365 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 366 end = _w(s, end).end() 367 if end != len(s): C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\decoder.pyc in raw_decode(self, s, idx) 379 """ 380 try: --> 381 obj, end = self.scan_once(s, idx) 382 except StopIteration: 383 raise ValueError("No JSON object could be decoded") UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 6: invalid start byte Please help and explain the reason for the error Note from the Author or Editor:** Editors, can you please change "open(path)" to "open(path, 'rb')" ? this will fix this issue for readers using Python 3	Mrinal	Jul 05, 2014	Dec 12, 2014
PDF	Page 23	For the code example following: In [301]: tz_counts[:10].plot(kind='barh', rot=0) The 'plot' function has no visible effect. Should be in iPython? (which also doesn't work.) Note from the Author or Editor: There should be a note at the beginning of the chapter to run IPython in pylab mode. Editors: please place a note at the end of the opening paragraph that says: "To follow along with these examples, you should run IPython in Pylab mode by running <literal>ipython --pylab</literal> at the command prompt."	Brian Piercy	Dec 04, 2012	May 17, 2013
Printed, PDF	Page 23 middle of page	In the PDF version, the url overshoots the page Note from the Author or Editor: Editors please insert a line break like so in the console output Out[304]: u'Mozilla/5.0 (Linux; U; Android 2.2.2; en-us; LG-P925/V10e Build/FRG83G) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'	Anonymous	Apr 18, 2013	May 17, 2013
Printed	Page 23 1st code sample	The 2nd and 3rd use of pd.read_table should use the ratings.dat and movies.dat file and not users.dat Note from the Author or Editor: Thanks. This has been fixed	Richard White	Mar 30, 2015
Printed	Page 23 first code block after 2nd paragraph	In the users.dat file downloaded from https://grouplens.org/datasets/movielens/1m/ the data for 'gender' is before 'user_id' e.g. 1::F::1::10::48067 2::M::56::16::70072 therefor unames should not be defined as : unames = [ 'user_id', 'gender', 'age', 'occupation', 'zip'] instead should be : unames = [ 'gender', 'user_id', 'age', 'occupation', 'zip'] Note from the Author or Editor: will review	Edward Hope	Jan 18, 2018
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 24 two fifths down the page	Found same problem as CJ: 66 In the following line: operating_system = np.where(cframe['a'].str.contains('Windows'), 'Windows', 'Not Windows') np was not defined, so this line gives an error 99 Question: Why don't any of these known errata get confirmed/addressed by the author or staff at O'Reilly? Note from the Author or Editor: On page 21 please change the code line In [290]: about halfway down the page from In [290]: import pandas as pd to In [290]: import pandas as pd; import numpy as np This mistake is fairly minor (all things considered) as these code examples are intended to be run in IPython in "pylab" mode (ipython --pylab) which will have imported NumPy and created the np alias. Sorry about that	Moritz Heukamp	May 11, 2013	May 17, 2013
PDF	Page 29 2nd paragraph	totals should be titles: "This produced another DataFrame containing mean ratings with movie totals as row labels and gender as column labels. " should read "This produced another DataFrame containing mean ratings with movie titles as row labels and gender as column labels. " Note from the Author or Editor: Good catch. Editors, please make the indicated change. Thanks	vrajmohan	Sep 26, 2013	Dec 12, 2014
Printed	Page 33 middle	I get a ValueError: array dimensions must agree except for d_0 when I run line 371: names1880.groupby('sex').births.sum(). names1880.groupby('sex')['births'].sum() works. Note from the Author or Editor: We have addressed this (I believe) in a review of the code examples. Will follow up with editors to verify that it is fixed	Allen Long	Nov 03, 2013	Dec 12, 2014
PDF	Page 38 Code on bottom of page 38 and top of page 39	searchsorted() is a method available for NumPy arrays, not Pandas Series. So to get the code in the book to work, I needed to first convert the Series to a NumPy array with array(). In final code, the get_quantile_count() function is as follows: # Get number of distinct names in the top 50% of births using clever NumPy hack def get_quantile_count(group, q=0.5): group = group.sort_index(by='prop', ascending=False) return array(group.prop.cumsum()).searchsorted(q) + 1 Note from the Author or Editor: Ah, this is a casualty of some API changes in pandas: Editors, could you change the indicated line to be instead: group.prop.cumsum().values.searchsorted(q) + 1	Todd Leonhardt	Sep 14, 2013	Dec 12, 2014
Printed	Page 38 United States	After defining the array prop_cumsum you want to call the method searchsorted to search for the 50th percentile. The code supplied is prop_cumsum.searchsorted(0.5), which throws the error Series object has no Attribute searchsorted I got this to work sort of: numpy.searchsorted(prop_cumsum,0.5), the only problem is the output is every line number in the array followed by the index position. Can you shed any light on the code as written in the text and the code I got to work? Thanks Note from the Author or Editor: This is caused by API changes in pandas. We have fixed the code example in an overall review of the examples, so this will be addressed in the next printing.	Anonymous	Jun 25, 2014	Dec 12, 2014
PDF	Page 40 in [3]	While executing the code from the book: In [3]: data = {i : randn() for i in range(7)} I got the following error: NameError: global name 'randn' is not defined. I solved it by using "from scipy import randn". (Perhaps included packages depend on ipython configuration.) Note from the Author or Editor: Page 46 in the printed text, please insert the line In [541]: import numpy as np right above the In [542]: ... and make sure there is a blank line for consistent formatting	Anonymous	Aug 15, 2012	May 17, 2013
PDF	Page 43 United States	filename m1-1m /users.dat should be movielens/users.dat Note from the Author or Editor: Correct -- editors, could you make the indicated change (replace ml-1m with movielens)?	Anonymous	Dec 07, 2013	Dec 12, 2014
ePub	Page 46 printed text,	Code from Safari: In [541]: import numpy as np In [542]: data = {i : randn() for i in range(7)} This causes an error: NameError: global name 'randn' is not defined This works data = {i : np.random.randn() for i in range(7)} Appears there is a problem with the 'import numpy as np' being incomplete. Note from the Author or Editor: Good catch, and I believe we tried to correct this error in the last revision. Editors, could you replace the indicated randn with np.random.randn ? thanks	Anonymous	Jun 24, 2013	Dec 12, 2014
PDF	Page 52 top	the two ways of computing top1000 give different results Note from the Author or Editor: I have made a note to look into this since we have made a full review of the book's code examples. There might be a bug in pandas, in which case I will report upstream to the dev team	Anonymous	Dec 07, 2013	Dec 12, 2014
PDF	Page 53 Table 3-1	Commands are given as 'Ctrl-P', 'CTRL-A', etc. with the letter in UPPERCASE, which is potentially confusing, since the keys are to be pressed without the shift key (except 'Ctrl-Shift-v'). In fact, without the example containing a 'Shift', I would not be sure this is an error. Note from the Author or Editor: A fair point. Editors: Please change the single letters in the command shortcuts in Table 3-1 to lowercase. E.g. Ctrl-Shift-V should be Ctrl-Shift-v and Ctrl-B should be Ctrl-b Thanks	Steven Pav	Dec 27, 2012	May 17, 2013
Printed	Page 54 2nd paragraph	... designed to faciliate common tasks ... Note from the Author or Editor: Please fix facilitate typo	Frans Koning	Nov 22, 2012	May 17, 2013
PDF	Page 54 Code example at bottom of page	When I try to do 'a' in _ip.user_ns it throws a NameError exception and says "name '_ip' is not defined. I can use the IPython magic %who to see if the variable is in memory or not. Note from the Author or Editor: I should have known better than to use a private IPython API. editors, could we remove this altogether: In [8]: 'a' in _ip.user_ns Out[8]: True change the line number of the subsequent prompt to 8 (instead of 9) then, remove the following lines: In [1]: 'a' in _ip.user_ns Out[1]: False and add these lines in its place: In [10]: a --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-10-60b725f10c9c> in <module>() ----> 1 a NameError: name 'a' is not defined thanks	Todd Leonhardt	Sep 15, 2013	Dec 12, 2014
Printed	Page 65 Paragraph 1	what is referred to as Table 3-3 in the text is actually displayed as Table 3-4 Note from the Author or Editor: Confirmed. Please fix reference to Table 3-4	Anonymous	Apr 18, 2013	May 17, 2013
Printed	Page 67 Last sentence of third paragraph	Text reads "Here is a simple list of 700,000 strings ..." but the sample code produces 600,000 strings. Note from the Author or Editor: Good catch. Editors, could you change the copy to say 600,000 instead of 700,000?	James Williamson	May 26, 2013	Dec 12, 2014
Printed	Page 69 Paragraph 4, last sentence	'while' should be 'whole' Note from the Author or Editor: Confirmed, thanks	Anonymous	Apr 18, 2013	May 17, 2013
	Page 72 clean_strings(states)	On page 72, after applying clean_strings(states), "South Carolina" has unwanted space in between, I do believe this is a print error. Sorry if a false flag, just trying to help. Note from the Author or Editor: will improve the example	Mauricio Ruiz	Jun 23, 2022
Printed	Page 75 paragraph 2, sentence 2	'willl' should be 'will' Note from the Author or Editor: Confirmed. thanks	Anonymous	Apr 18, 2013	May 17, 2013
Printed	Page 77 Top bullet points	The third bullet point in the sample configuration changes is unnecessary: it repeats the first clause of the second bullet point. Note from the Author or Editor: good catch. Editors, could you remove the 3rd bullet point?	jworeilly	May 26, 2013	Dec 12, 2014
Printed	Page 83 Last line in table 4-2 on this page	"float64, float128" should read "float64" only. "float128" already correctly appears on the next line in the table (on page 84). Note from the Author or Editor: Correct. Please delete the ", float128" there	Dan Grossman	Jan 25, 2013	May 17, 2013
Printed	Page 86 Final paragraph, first sentence.	"... especially if they have used ..." should read "... especially if you have used ..." Note from the Author or Editor: Thanks, please correct typo as described	Dan Grossman	Jan 25, 2013	May 17, 2013
PDF	Page 89 In [84]:	As randn is a function in the numpy.random module, the line should read: data = np.random.randn(7, 4) Note from the Author or Editor: yes: editors, please make the indicated change	vrajmohan	Sep 17, 2013	Dec 12, 2014
Printed	Page 90 paragraph 1, sentence 2	par 1, sentence 2 is a fragment Note from the Author or Editor: Change the first two sentences of that paragraph to Suppose each name corresponds to a row in the <literal>data</literal> array, and we wanted to select all the rows with corresponding name <literal>'Bob'</literal>.	Anonymous	Apr 18, 2013	May 17, 2013
Printed	Page 95 In [123]: and In [124]:	As in "In [84]:" on page 89, `randn()' should read `np.random.randn()' ... Note from the Author or Editor: Editors: can you please make the indicated change? Replace randn() with np.random.randn()	Kazuyoshi Furutaka	Jun 11, 2014	Dec 12, 2014
Printed, PDF	Page 99 Second to last paragraph	"scalers" should be "scalars"	Wes McKinney	May 13, 2013	May 17, 2013
Printed, PDF	Page 100 United States	1 * cond1 + 2 * cond2 + 3 * -(cond1 \| cond2) is not equivalent to the two other code examples offered. In particular, if cond1 and cond2 are both False, the result is 0, not 3. Note from the Author or Editor: Oops. Please change that line of code to 1 * (cond1 & -cond2) + 2 * (cond2 & -cond1) + 3 * -(cond1 \| cond2)	Aaron Schumacher	Apr 07, 2013	May 17, 2013
Printed	Page 103 3rd paragraph	In 1st release of 2nd edition print copy, section on numpy fancy indexing (page 103, 3rd paragraph) says, “...the result of fancy indexing is always one-dimensional.” However, there are example outputs in this section with more than one dimension. Is that because some of the examples in the section are not fancy indexing? If that’s the case, it’s unclear where the section is building up to a fancy indexing example as opposed to every example being fancy indexing. The number of dimensions in the output seems to be number of array dimensions minus number of index dimensions, unless index can also have more dimensions than the array. Note from the Author or Editor: will clarify	Stephen Frost	Feb 08, 2018
Printed, PDF	Page 106 Table 4-7	For pinv description remove the word "square" (this function does not require that the matrices be square)	Wes McKinney	May 13, 2013	May 17, 2013
Printed, PDF	Page 106 Table 4-7	In description of lstsq, replace "y = Xb" with the more commonly used "Ax = b"	Wes McKinney	May 13, 2013	May 17, 2013
	Page 107 Table 4-8	Table 4-8: the description for binomial should read 'Draw samples from a binomial distribution' Note from the Author or Editor: Please fix as described. thanks!	Anonymous	Apr 18, 2013	May 17, 2013
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 107 Middle of page	Change "See table Table 4-8..." to "See Table 4-8..."	Wes McKinney	May 12, 2013	May 17, 2013
PDF	Page 113 In [17]	In [17]: np.exp(obj2) numpy needs to be imported before this code. There should be a line of code before this code: import numpy as np Note from the Author or Editor: Will add missing import	Dan Yuan	Sep 13, 2017
	Page 114 Bottom half	The text (pdf page 114, book pages 134-135) illustrates the creation of a DataFrame from a dict. First, the dict creation is shown: pop = {'Nevada': {2001: 2.4, 2002: 2.9}, 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}} then, the data frame is created: frame3 = pd.DataFrame(pop) That's all fine thus far. However, the display of the DataFrame after creation isn't correct, in that the index order as shown isn't what occurs. That is, the book and pdf show: Nevada Ohio 2000 NaN. 1.5 2001 2.4 1.7 2002 2.9 3.6 But in reality, the DataFrame is displayed thus, if one follows along with the text as shown: Nevada Ohio 2001 2.4 1.7 2002 2.9 3.6 2000 NaN 1.5 In other words, the indices are displayed 2001, 2002, 2000, rather than 2000, 2001, 2002. This matters, because the examples that follow immediately (which involve transposing and also index slicing using that first DataFrame) then won't work as shown. The problem lies with the original dict creation. If the order of the "Nevada" and "Ohio" dicts are swapped, with Ohio being first, then the indices will appear in the desired order (i.e., 2000, 2001, 2002). (However, note that the columns in the resulting DataFrame will also then be swapped (with Ohio appearing as the first column, and Nevada second)). The bottom line is that the whole set of examples doesn't work as shown, unfortunately, and there is a cascading effect - the first example is off, and thus so are the following examples based upon the first. Note from the Author or Editor: this was fixed in the 3rd edition	Andrew Boudreau	May 30, 2021
PDF	Page 119 Table 5-5	The description of the copy option for reindex in table 5-5 of the current (as of 8/2/12) preprint version may be wrong. It says that copy is "Do not copy underlying data if new index is equivalent to old index." I believe this is the opposite of copy's behavior, and the words "Do not" should be removed. Note from the Author or Editor: Change text to If True, always copy underlying data even if new index is equivalent to old index. Otherwise, do not copy the data when the indexes are equivalent.	Dan Becker	Aug 02, 2012	May 17, 2013
PDF	Page 123 Table 5-6, 2nd row	"Selects single row of subset of rows from the DataFrame." shoud probably be "Selects single row or subset of rows from the DataFrame." Note from the Author or Editor: Confirmed typo as described	Guan Yang	Aug 16, 2012	May 17, 2013
Printed	Page 124 table 5.5	Description for argument copy is self contradictory. Appears to say copy true means don't copy Note from the Author or Editor: The text could be clearer. Editors, could you change "Otherwise" to read "If False" (use fixed width font for the False) in the table?	gwideman	Jul 03, 2013	Dec 12, 2014
Printed	Page 125 Last sentence	last sentence: should read 'Here are some examples of this:' Note from the Author or Editor: please fix as described. thanks!	Anonymous	Apr 18, 2013	May 17, 2013
Printed	Page 145 Bottom of page & continuing	"...if you have an axis containing integers, data selection will always be label-oriented." But earlier, on p. 141: "Slicing with labels...the endpoint is inclusive" So why at bottom of p. 145 does ser[:1] not include the endpoint of the slice (only first row returned)? Shouldn't label-oriented slicing of this "axis containing integers" make ser[:1] return the same two rows as ser.loc[:1]? Shouldn't it be the case that only ser.iloc[:1] is not label-oriented, and therefore only it excludes the endpoint of the slice? Note from the Author or Editor: will review	Stephen Frost	Feb 08, 2018
Printed	Page 150 Bottom of 2nd code block	df1 - df2 should use '+' operator instead if adding lists. '-' operator still produces the same result.	Shivan Sivakumaran	Oct 08, 2020
Printed, PDF	Page 152 Final code block	The line currently is: frame = DataFrame(np.arange(6).reshape(3, 2)), index=[2, 0, 1]) It should instead be: frame = DataFrame(np.arange(6).reshape(3, 2), index=[2, 0, 1]) Note from the Author or Editor: Confirmed. please change as described	Joshua Lande	Mar 14, 2013	May 17, 2013
Printed	Page 152 Second paragraph	Duplicate colons introduce the second example code block. Note from the Author or Editor: Please remove the unnecessary colon	jworeilly	Jun 07, 2013	Dec 12, 2014
Printed	Page 152 Middle	For line [294] of the iget_value code example, the second ")" after the call to reshape(3, 2) is incorrect. Note from the Author or Editor: I believe this is already fixed in the second printing	jworeilly	Jun 07, 2013	Dec 12, 2014
Printed	Page 153 bottom of page	pdata.ix['Adj Close', '5/22/2012':, :] refers to Adj Close. The table below that shows the Close, not the Adj Close. Note from the Author or Editor: Very strange. Editors, can you please change the indicated line of code to: pdata.ix['Adj Close', '5/22/2012':, :] See also revised code examples for an alternative replacement.	Arie Ellerbrak	Aug 01, 2013	Dec 12, 2014
PDF	Page 160 United States	keep_date_col description is inconsistent with the pandas documention. Should be: If joining columns to parse date, keep the joined columns. Default False Note from the Author or Editor: Confirmed. Please change as described	Thomas Maloney	Jan 04, 2013	May 17, 2013
Printed	Page 162 Middle op the page	In order for data.to_csv(sys.stdout, sep='\|') to work you must import sys first Note from the Author or Editor: Editors, find this text on the page (writing to sys.stdout so it just prints the text result) change it to (writing to sys.stdout so it just prints the text result; make sure to import sys) use fixed width font for "import sys"	Arie Ellerbrak	Aug 01, 2013	Dec 12, 2014
PDF	Page 170 Middle	The Output of perf = DataFrame(data) is not correct. As printed: In [928]: perf Out[928]: Empty DataFrame Columns: array([], dtype=int64) Index: array([], dtype=int64) But should be: <class 'pandas.core.frame.DataFrame'> Int64Index: 648 entries, 0 to 647 Data columns: AGENCY_NAME 648 non-null values CATEGORY 648 non-null values DESCRIPTION 648 non-null values FREQUENCY 648 non-null values INDICATOR_NAME 648 non-null values INDICATOR_UNIT 648 non-null values MONTHLY_ACTUAL 648 non-null values MONTHLY_TARGET 648 non-null values PERIOD_MONTH 648 non-null values PERIOD_YEAR 648 non-null values YTD_ACTUAL 648 non-null values YTD_TARGET 648 non-null values dtypes: int64(2), object(10) Note from the Author or Editor: Confirmed. Please change the text of Out[928]: to <class 'pandas.core.frame.DataFrame'> Int64Index: 648 entries, 0 to 647 Data columns: AGENCY_NAME 648 non-null values CATEGORY 648 non-null values DESCRIPTION 648 non-null values FREQUENCY 648 non-null values INDICATOR_NAME 648 non-null values INDICATOR_UNIT 648 non-null values MONTHLY_ACTUAL 648 non-null values MONTHLY_TARGET 648 non-null values PERIOD_MONTH 648 non-null values PERIOD_YEAR 648 non-null values YTD_ACTUAL 648 non-null values YTD_TARGET 648 non-null values dtypes: int64(2), object(10)	Thomas Maloney	Jan 04, 2013	May 17, 2013
Printed	Page 172 Last paragraph, 2nd sentence	Interally -> Internally Note from the Author or Editor: Confirmed typo	Arie Ellerbrak	Aug 02, 2013	Dec 12, 2014
Printed	Page 175 United States	Current text "...pandas has a read_frame function in its pandas.io.sql module that simplifies the process." Warnings when running code: 1. "read_frame is depreciated, use read_sql " 2. "Reading a table with read_sql is not supported" "for a DBIAPI2 connection. Use a SQLAlchemy" "engine or specify a SQL query" This apparently changed with pandas release v0.14.0 (May 31 , 2014). Essentially the SQL function names change and the engine object replaces the connection object. The SQL changes are documented in: http://pandas.pydata.org/pandas-docs/stable/pandas.pdf page 8 "SQL interfaces updated to use sqlalchemy, " page 18 "The SQL reading and writing functions now support more database flavors through SQLAlchemy... The new functions read_sql_query() and read_sql_table() are introduced. The function read_sql() is kept as a convenience wrapper around the other two and will delegate to specific function depending on the provided input (database table name or sql query). In practice, you have to provide a SQLAlchemy engine to the sql functions. To connect with SQLAlchemy you use the create_engine() function to create an engine object from database URI. You only need to create the engine once per database you are connecting to. For an in-memory sqlite database: In [43]: from sqlalchemy import create_engine # Create your connection. In [44]: engine = create_engine(�sqlite:///:memory:�) This engine can then be used to write or read data to/from this database: In [45]: df = pd.DataFrame({�A�: [1,2,3], �B�: [�a�, �b�, �c�]}) In [46]: df.to_sql(�db_table�, engine, index=False) You can read data from a database by specifying the table name: In [47]: pd.read_sql_table(�db_table�, engine) Out[47]: A B 0 1 a 1 2 b 2 3 c or by specifying a sql query: In [48]: pd.read_sql_query(�SELECT * FROM db_table�, engine) Out[48]: A B 0 1 a 1 2 b 2 3 c" Note from the Author or Editor: We are fixing this in the code example review. Will be fixed in next printing	Jim Callahan	Jul 31, 2014	Dec 12, 2014
Printed	Page 175 top	Due to change to SQLAlchemy the conn object is replaced by an engine object. The line, conn = sqlite3.connect(':memory:') should be replaced by To use a SQLite :memory: database, specify an empty URL: engine = create_engine('sqlite://') Notice that 'sqlite' is in lowercase and without a '3' suffix. For a relative file path, this requires three slashes: engine = create_engine('sqlite:///foo.db') And for an absolute file path, four slashes are used: engine = create_engine('sqlite:////absolute/path/to/foo.db') source: http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#sqlite Note from the Author or Editor: Editors: We are addressing this in the code example review. Reporter: This will be fixed in the next printing	Jim Callahan	Jul 31, 2014	Dec 12, 2014
PDF, ePub, Mobi	Page 192 Beginning of section Pivoting ?long? to ?wide? Format	The section begins: A common way to store multiple time series in databases and CSV is in so-called long or stacked format: In [116]: ldata[:10] However, the variable ldata has not been defined or initialized previously (or later) in the book. Note from the Author or Editor: Yeah, I left the code to make that DataFrame out as it was derived in a mungy way from the macrodata used earlier. Editors: please put a note in parentheses after "stacked format" that says "... or stacked format (code to create this DataFrame omitted for brevity):" or something. pretty trivial for the user to type this in	David Kimery	Apr 17, 2013	May 17, 2013
PDF, ePub	Page 192 out 116 and out 118	In chapter 7, in the subsection entitled "Pivoting "long" to "wide" Format" . . . On further examination -- the ldata output in out 116 is only for part of ldata, as in ldata[:10]. This omits five rows of data that should be in ldata based on the rest of the examples in this section: 10 1959-12-31 00:00:00 infl 0.270 11 1959-12-31 00:00:00 unemp 5.600 12 1960-03-31 00:00:00 realgdp 2847.699 13 1960-03-31 00:00:00 infl 2.310 14 1960-03-31 00:00:00 unemp 5.200 Note from the Author or Editor: I need to look into this, but I am going to try to add the code to generate the ldata table. I replied to your other question, but I didn't realize until further examination that the code was omitted. I made a note to myself and will address separately with the editors	Doug McCaleb	Aug 15, 2013	Dec 12, 2014
Printed	Page 192 Belgique	A reader posted earlier the following comment: "The section begins: A common way to store multiple time series in databases and CSV is in so-called long or stacked format: In [116]: ldata[:10] However, the variable ldata has not been defined or initialized previously (or later) in the book. " Perhaps would it be helpful to slightly alter the example to make it immediately testable by the audience of the book: from pandas.core.reshape import melt, pivot df = pd.read_csv('ch07/macrodata.csv') # original format data = df.ix[:,['year', 'quarter', 'realgdp', 'infl', 'unemp']] # selection of variables data['date'] = 10data['year']+data['quarter'] # some quick identificator for the 'date' instead of separate year and quarter variables del data['year'] del data['quarter'] ldata = melt(data, id_vars = ['date']) # long format pivoted = ldata.pivot('date', 'variable', 'value'); pivoted.head() # Note: 'item' becomes 'variable' in the rest of the example Note from the Author or Editor:* OK, sounds good. Editors, could you remove this text: (code to create this DataFrame omitted for brevity) then, after the first code example (ldata[:10]), could you put a code block with this code used to create the example: data = pd.read_csv('ch07/macrodata.csv') periods = pd.PeriodIndex(year=data.year, quarter=data.quarter, name='date') data = DataFrame(data.to_records(), columns=pd.Index(['realgdp', 'infl', 'unemp'], name='item'), index=periods.to_timestamp('D', 'end')) ldata = data.stack().reset_index().rename(columns={0: 'value'})	Patrick Jeuniaux	Oct 14, 2013	Dec 12, 2014
PDF	Page 194 3rd paragraph under "Removing Duplicates"	"Relatedly, drop_duplicates returns a DataFrame where the duplicated array is True:" The index values from `data.drop_duplicates()` suggest that drop_duplicates returns rows where the duplicated() array is False. Note from the Author or Editor: Nice catch, will fix in the upcoming printing.	Chapman	Nov 17, 2014	Dec 12, 2014
Printed,	Page 194 3rd paragraph	On the 3rd paragraph of "Removing Duplicates" sub-section: the drop_duplicates function returns where it is FALSE although the book says where it is TRUE. "Relatedly, drop_duplicates returns a DataFrame where the duplicated array is True: In [129]: data.drop_duplicates()" So, the 'True' should be replaced by 'False'. Thanks. Simone. Note from the Author or Editor: This has been fixed in the 2nd edition	Simone Occulate	Dec 15, 2014
Printed	Page 199 Top of page.	The bins are divided into 18 to 25, 26 to 35, 35 to 60 and 60 and older. Should be 18 to 26, 26 to 35, 35 to 60, 60 and older or 18 to 25, 25 to 35, 35 to 60, 60 and older. Note from the Author or Editor: editors, can you please change the copy to: 18 to 25, 26 to 35, 36 to 60, and finally 61 and older	Arie Ellerbrak	Aug 02, 2013	Dec 12, 2014
PDF	Page 203 Middle of the page	Splitting the categories from the movie dataset can achieved by using: movies.genres.str.get_dummies('\|') Note from the Author or Editor: Awesome. I'll use this feature in the next iteration of the book	Kristof	Sep 05, 2015
PDF	Page 204 somewhere	ch07/movies.dat is not there (is in ch02/movielens) Note from the Author or Editor: Thanks. please change 'ch07/movies.dat' to 'ch02/movielens/movies.dat' in the code	Miki Tebeka	Nov 09, 2012	May 17, 2013
Printed	Page 217 Caption for Figure 7.1	Figure 7-1 displays values by food group, not by nutrient group (Zinc is the nutrient in the example). Its captions should hence read something along the lines of "Median Zinc values by food group". Note from the Author or Editor: Confirmed the caption is wrong. Will fix	David Garcia Quintas	Sep 07, 2015
	Page 223 Table 8-1	Table 8-1: the description for 'subplot_kw' is cut off Note from the Author or Editor: Please change the description for subplot_kw to Dict of keywords passed to <literal>add_subplot</literal> call used to create each subplot.	Anonymous	Apr 18, 2013	May 17, 2013
	Page 235 paragraph1, sentence 1	par 1 sentence 1: should read '... is as simple as ...' Note from the Author or Editor: Please fix typo as described. thanks!	Anonymous	Apr 18, 2013	May 17, 2013
PDF	Page 241 somewhere	scatter_matrix(trans_data, diagonal='kde', color='k', alpha=0.3) should be pd.scatter_matrix(trans_data, diagonal='kde', color='k', alpha=0.3) Note from the Author or Editor: Thanks. Please change code as described (add pd. to start of statement)	Miki Tebeka	Nov 09, 2012	May 17, 2013
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 241-242 Fig 8-23	Fig 8-23 appears to be identical to Fig 8-22 Note from the Author or Editor: Not sure what happened here, 8-23 is supposed to be a different figure if you read the text closely. Here is a figure to replace 8-23 (should just be a drop-in replacement), editors please contact me if you need any changes to this: https://www.dropbox.com/s/annqtoank0snrwu/scatter_matrix_fix_20130512.pdf	Anonymous	Apr 18, 2013	May 17, 2013
PDF	Page 246 Example code	The example code on the page 246 (Plotting Maps: Visualizing Haiti Earthquake Crisis Data) no longer works due to change of pandas since v0.13.0 released on 31 Dec 2013. To make it work, x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE) should be x, y = m(cat_data.LONGITUDE.values, cat_data.LATITUDE.values) You may find details on http://stackoverflow.com/questions/23136159 Apart from this, it will be also great if we add the following line at the end of the same example code to show the resulting plot. plt.show() Note from the Author or Editor: Editors: please verify that this has been fixed in the overall code example review.	Younghoon Rhiu	Jun 21, 2014	Dec 12, 2014
Printed	Page 266 Top half	demeaned.groupby(key).mean() does not work for me; that is, it yields non-zero values (and not just due to rounding). I think the issue is that the people DataFrame gets reorganized internally with rows in different order. This doesn't seem to affect the alignment of key within people. But it does affect demean, so the values of key no longer line up with their original position. import pandas as pd from pandas import DataFrame import numpy as np def demean(arr): return arr - arr.mean() # This doesn't work. people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis']) key = ['one', 'two', 'one', 'two', 'one'] demeaned = people.groupby(key).transform(demean) print demeaned print demeaned.groupby(key).mean() produces a b c d e Jim 0.223861 -2.072542 0.973977 -0.021754 -1.019689 Joe 0.326119 0.671576 0.487932 -0.404353 1.219755 Steve -0.223861 2.072542 -0.973977 0.021754 1.019689 Travis 0.204880 -0.422467 -1.024938 -0.555061 -0.563228 Wes -0.530999 -0.249109 0.537006 0.959414 -0.656527 a b c d e one -0.177000 -0.083036 0.179002 0.319805 -0.218842 two 0.265499 0.124555 -0.268503 -0.479707 0.328264 Note from the Author or Editor: This appears to be a bug in pandas unfortunately. I have reported it to the dev team here -- the appropriate action here is to fix the bug rather than changing the book text: https://github.com/pydata/pandas/issues/8046	Ian Gow	Jul 06, 2013	Dec 12, 2014
Printed	Page 266 top half	This is reference to an issue that Ian Gow has also pointed about above (Jul 06, 2013). A possible solution to the problem is mentioned below. Define people as in the book. The values are a different since 'randn' gives different numbers. >>> people a b c d e joe 2.011219 0.139871 -0.169945 1.801018 0.560313 steve -0.878164 0.121969 -0.174672 -1.500867 1.548067 wes -0.460175 -0.449552 1.213917 1.250151 0.191200 jim 2.286116 -1.253508 -0.567102 -0.802946 1.432807 travis -0.506323 0.807026 0.960450 -1.266392 0.567154 Define key as in the book: >>> key ['one', 'two', 'one', 'two', 'one'] However, the error is that the following does not give zero mean: demeaned = people.groupby(mapc,axis=0).transform(demean) demeaned.groupby(mapc,axis=0).mean() >>> demeaned = p.groupby(key).transform(demean) >>> demeaned.groupby(key).mean() a b c d e one -0.269472 -0.205111 0.181926 0.218409 -0.082785 two 0.404208 0.307667 -0.272888 -0.327613 0.124178 A possible solution is to do the following. Define mapc as: mapc = {'joe':'one', 'steve':'two', 'wes':'one', 'jim':'two', 'travis':'one'} and now the the following produces zero mean: >>> demeaned = p.groupby(mapc).transform(demean) >>> demeaned.groupby(mapc).mean() a b c d e one 7.401487e-17 0 3.700743e-17 3.700743e-17 -4.625929e-17 two 0.000000e+00 0 -1.387779e-17 5.551115e-17 0.000000e+00 Note from the Author or Editor: We are working to address this in pandas: https://github.com/pydata/pandas/issues/8046	Qasim Iqbal	Oct 25, 2013	Dec 12, 2014
Printed, PDF, ePub	Page 271 bottom	This statement from shapelib import ShapeFile asks the shapelib library. I tried to install shapelib and pyshapelib (the binding), but it gave an error shapelibc.so: undefined symbol: SASetupDefaultHooks Judging from the fact that pyshapelib was last updated in 2007, we are wondering if it is still compatible with newer version of shapelib. Could you recommend another shapelib binding that will work with the examples of the book? Note from the Author or Editor: We may need to remove this example; I know there are various issues with basemap as well. I've made a note and I will follow up with O'Reilly editors	Anonymous	Sep 09, 2013	Dec 12, 2014
PDF	Page 282 somewhere	Should be return totals.order(ascending=False)[:n] (was [-n:]) Note from the Author or Editor: Correct. Please fix code typo as described (replace [-n:] with [:n])	Miki Tebeka	Nov 09, 2012	May 17, 2013
Printed	Page 287 1st line of example "In [108]"	Update seaborn now requires kwargs "x=" and "y=" for first two arguments in example reference.	Dennis Gonzales	Apr 23, 2021
Printed	Page 308 middle of page	Out[470] should be 'Period('2007-06', 'M')' Note from the Author or Editor: Confirmed, please make change as described There is also a formatting mistake right before "Out [470]:" , please fix that also	Anonymous	Apr 18, 2013	May 17, 2013
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 324 bottom of page	In[570]: spx_px is has not been defined in the chapter yet Note from the Author or Editor: Please add code line just above In [570]: In [569]: spx_px = close_px_all['SPX'] Make sure there is a blank line between that code line and the next one to keep the styling consistent	Anonymous	Apr 18, 2013	May 17, 2013
Printed	Page 324 First paragraph of Exponentially-weighted functions	The formula for the moving average is written as ma_t = a * ma_{t-1} + (a-1) * x_{-t} with a the decay factor. It should be: ma_t = a * ma_{t-1} + (1-a) * x_{t} Note from the Author or Editor: Good catch, please make this change	Bertrand Haut	Mar 06, 2014	Dec 12, 2014
Printed	Page 344 1st paragraph, body of the "to_index" function	The given defintion of to_index: def to_index(rets): index = (1 + rets).cumprod() first_loc = max(index.notnull().argmax() - 1, 0) index.values[first_loc] = 1 return index doesn't seem to work with Pandas 0.14.1, firstly due to "index.notnull().argmax() - 1", where index.notnull().argmax() is now a Timestamp without an offset, from which one can't substract an int. Morever, one can't compare it against an int, as part of the max() function. The following version works: def to_index(rets): index = (1 + rets).cumprod() first_loc = index.notnull().argmax() index[first_loc] = 1 return index Note from the Author or Editor: Good catch will fix in the upcoming printing.	David Garcia Quintas	Oct 04, 2014	Dec 12, 2014
PDF	Page 345 Signal Frontier Analysis section	The example refers to a mean reverting strategy and not a momentum portfolio because we rank returns in descending order. E.g. the highest return gets the rank 1, which translates in a lower portfolio weight after demeaning and normalizing. So either we change the text or, if we really want to provide an example of momentum portfolio we change the function calc_mon and use ascending=True, i.e. ranks = mom_ret.rank(axis=1, ascending=True) There is another small error in function strat_sr on page 346. Here when we compute the portfolio we use a lag value of 1, meaning that for portfolio at day t we use only information from day t-1 back. This is ok, however, when we then compute the total cumulative returns there is no need to again shift the portfolio by one day, as this implies that we just through away one day of information, so the line: port = port.shift(1).resample(freq, how='first') should be: port = port.resample(freq, how='first') Note from the Author or Editor: You're right about the momentum portfolio. Editors, on page 345 can you replace the two usages of "momentum" with "mean reversion" and on Page 347, in the Figure 11-3 caption can you also make the same substitution. The second note about the strat_sr function is not errata because the portfolio weights are the portfolio weights: they have to be shifted forward to compute the portfolio returns in the next period, so no changes needed there.	Anonymous	Jul 01, 2014	Dec 12, 2014
Printed	Page 351 Table 11-5. Resample method arguments	The 'freq' argument seems wrong, when trying it explicitly, the following error message is returned: In [22]: ts2 = ts.resample(freq='M') --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-22-ad32b3871b3e> in <module>() ----> 1 ts2 = ts.resample(freq='M') TypeError: resample() got an unexpected keyword argument 'freq' Indeed, in the docs (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.resample.html) it is listed as 'rule' argument, which works: In [25]: ts2 = ts.resample(rule='M').mean() In [26]: ts2 Out[26]: 2000-01-31 -0.123505 2000-02-29 0.011267 2000-03-31 0.180698 2000-04-30 0.007794 Note from the Author or Editor: confirmed, will fix	Yonathan Mizrahi	Oct 08, 2018
Printed	Page 357 Second example "In [221]:"	<ipython-input-216-793d385fe06a>:1: FutureWarning: 'loffset' in .resample() and in Grouper() is deprecated. >>> df.resample(freq="3s", loffset="8H") becomes: >>> from pandas.tseries.frequencies import to_offset >>> df = df.resample(freq="3s").mean() >>> df.index = df.index.to_timestamp() + to_offset("8H") ts.resample('5min', closed='right', label='right', loffset='-1s').sum() Note from the Author or Editor: will fix	Dennis Gonzales	May 16, 2021
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 358 In Figure 12-3	arr.reshape((3,4), order=?) should read arr.reshape((4,3), order=?) Note from the Author or Editor: Correct, please fix figure text as described. Surprised this one evaded me but it's obvious once you see it =)	Dan Grossman	Jan 25, 2013	May 17, 2013
Printed	Page 363 Bottom of page	In box, The Broadcasting Ru should be The Broadcasting Rule	Wes McKinney	May 12, 2013	May 17, 2013
PDF	Page 365 image	Quote from page 364: "See Figure 12-6 for another illustration, this time subtracting a two-dimensional array from a three-dimensional one across axis 0." Figure 12-6 does not show subtraction nor numbers representing numpy data make any sense Note from the Author or Editor: The figure and text needs fixing The text: change "subtracting... from ..." to "adding...to..." In the Figure 12-6, change the numbers in the result to be double what they are, so instead of 0, 1, 2, 3, 4, 5, 6, 7, make then in the corresponding order double that, 0, 2, 4, 6, ...	klo	Oct 31, 2012	May 17, 2013
Printed, PDF	Page 373 10th Line	Error in code : In [77]: g = df.groupby('key').value correction : g = df.groupby('key) .value after a groupby method would lead to an error as ".value" is not any aggregation function. Given the context i think this should be just g = df.groupby('key')	Bharath Reddy	Mar 11, 2020
Printed	Page 378 index for as_ordered	as_ordered methdo, 378 --> as_ordered method	E G	Feb 18, 2020
Printed	Page 378 n/a	missing index and examples for merge_asof() function which has existed for a while and seems useful for financial time series. That said, is there a specific reason it has been omitted ? Or can one easily implement it with some of the documented functions,etc...? Note from the Author or Editor: will document in 3rd edition	E G	Feb 18, 2020
PDF	Page 390 Next to paw prints at the top	"Assignment is also referred to as binding, as we are binding a name to an object. Variables names that have been assigned may occasionally be referred to as bound variables." At the beginning of the second sentence, I think either 'variables' should be singular or the word 'names' should be removed. :-) Note from the Author or Editor: Editors: on Page 390, "Variables names" should be "Variable names"	Nick Carchedi	Jun 05, 2014	Dec 12, 2014
Printed	Page 400 middle of page	The text currently says: "When aggregating of otherwise grouping time series data, ..." It probably should say "When aggregating or otherwise grouping time series data" Note from the Author or Editor: Please fix typo as described, thanks	Anonymous	Apr 15, 2013	May 17, 2013
	Page 403 example line 'In [85]'	FutureWarning: statsmodels.tsa.AR has been deprecated in favor of statsmodels.tsa.AutoReg and statsmodels.tsa.SARIMAX. AutoReg adds the ability to specify exogenous variables, include time trends, and add seasonal dummies. The AutoReg API differs from AR since the model is treated as immutable, and so the entire specification including the lag length must be specified when creating the model. This change is too substantial to incorporate into the existing AR api. The function ar_select_order performs lag length selection for AutoReg models. AutoReg only estimates parameters using conditional MLE (OLS). Use SARIMAX to estimate ARX and related models using full MLE via the Kalman Filter. Note from the Author or Editor: this is fixed in the 3rd edition	Dennis Gonzales	May 30, 2021
Printed	Page 405 first snippet in page	The code snippet about the "xrange" function needs correction. Replace "x" with "i" in the following example: sum = 0 for i in xrange(10000): # % is the modulo operator: if x % 3 == 0 or x % 5 == 0: sum += i The right code should be: sum = 0 for i in xrange(10000): # % is the modulo operator: if i % 3 == 0 or i % 5 == 0: sum += i Note from the Author or Editor: Good catch. Editors, please change "x" to "i" in the indicated code example as written by the errata reporter	Gaston	Apr 15, 2014	Dec 12, 2014
PDF	Page 413 3rd IPython display: In[432], Out[434] and Out[435]	The example is correct but you may as well get the names correct too, seeing as the names are those of real people. On the 2nd line of In[432]: ('Schilling', 'Curt') should be ('Curt', 'Schilling') The output for Out[434] and Out[435] will then be corrected accordingly, to: Out[434]: ('Nolan', 'Roger', 'Curt') Out[435]: ('Ryan', 'Clemens', 'Schilling') Note from the Author or Editor: I've fixed this in the book source materials	Anonymous	Sep 07, 2015
PDF	Page 416 defaultdict examples at the top of the page	The two examples illustrating the usage of defaultdict, don't quite work as described in Python 3 (at least not in v. 3.4.3). For the first example, one cannot see the result in the same form as by the techniques on the previous page, by just typing by_letter; one must type dict(by_letter). Next, it is not clear what the example, counts = defaultdict(lambda: 4) is supposed to produce. Typing counts at the prompt (in IDLE), simply yields defaultdict(<function <lambda> at 0x02DD3B70>, {}) while typing dict(counts), yields {} It is not clear how one could incorporate this construction into the previous example or for a new example, to see how 4 gets used. Note from the Author or Editor: I will confirm that this behaves in the expected way in the 3rd ed	Anonymous	Sep 08, 2015
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 418 last line	IT IS: loc_mapping = dict((val, idx) for idx, val in enumerate(strings)} SHOULD BE: loc_mapping = dict((val, idx) for idx, val in enumerate(strings)) NOTE: Last character of code line should be ) and not }... probably from wrong copy&paster of previous code line. It's obvious, but I checked this with IPython. Note from the Author or Editor: Please fix typo as submitter described (replace curly brace with parenthesis) Thanks!	Jose Manuel MartÃƒÂ	May 09, 2013	May 17, 2013
Printed	Page 419 entire example	The example lacks a function to remove extra whitespace in string "south carolina##". Either the output should be altered at top and bottom of page 419 (i.e. "Out[15]" and "Out[22]") or a function should be added to normalize the whitespace between tokens. E.g. value = ' '.join(value.split()) Note from the Author or Editor: Will fix example and clarify text, since it only strips whitespace from the start and end of the tokens	Craig Murray	Feb 15, 2016
PDF	Page 420 Bottom third	The main restriction on function arguments it that the keyword arguments must follow the positional arguments (if any). 'it' should be 'is' Note from the Author or Editor: Editors: please change to "The main restriction on function arguments is that"	Nick Carchedi	Jun 06, 2014	Dec 12, 2014
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 427 Last code example in section "Currying: Partial Argument Application"	In the code comment: # Take the 60-day moving average of of all time series in data "of" is repeated. Note from the Author or Editor: Please fix typo as described (remove duplicate "of")	Jose Manuel MartÃƒÂ	May 09, 2013	May 17, 2013
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 432 Last line in Table A-6	IS: True is the file is closed. SHOULD BE: True if the file is closed. Note from the Author or Editor: Please make change as submitter described (replace is with if)	Jose Manuel MartÃƒÂ	May 10, 2013	May 17, 2013
ePub	Page 712 1st code example, list comprehension for enough_es within for loop	In the first code example for the Nest list comprehensions section, the "if name.count('e') > 2" within the list comprehension should have a ">=" instead of a ">". Note from the Author or Editor: You're right. Editors, could you please make the indicated change?	Todd Leonhardt	Sep 14, 2013	Dec 12, 2014
ePub	Page 727 Top of page, 1st code example	For the output to work as intended in the example, the print statement within def squares() needs to be outside the for loop within that generator function. The way the code is written, the 'Generating squares....' print will occur each time a new number is generated. But if you move the print outside the for, it will print exactly once. Note from the Author or Editor: Good catch. Authors could you change the code cited to look like this (mind the 4-space indents): def squares(n=10): print 'Generating squares from 1 to %d' % (n 2) for i in xrange(1, n + 1): yield i 2	Todd Leonhardt	Sep 14, 2013	Dec 12, 2014
Mobi	Page 5385 Example code	The location is based on the location information provided by my Kindle reader using the mobi format. I believe this would be page 229 in the physical edition. In the example code in section "Annotation and Drawing on Subplot," the first element of each tuple in the crisis_data list is of type datetime.datetime. These elements are used as an argument to pandas.asof(). However, this method takes a DateTimeIndex as an argument. Therefore, this date value needs to be converted using pandas.to_datetime() before making the call to asof(). Note from the Author or Editor: will review for 3e	Patton Bradford	Feb 13, 2016