Python for Data Analysis

Errata for Python for Data Analysis

Submit your own errata for this product.


The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted Date Corrected
Printed
Page vi
United States

The technical editor Hugh Brown is listed as Hugh White. Not sure of the page number.

Note from the Author or Editor:
Yes, many apologies. His name is Hugh Brown (and he was a great editor!)

Hugh Brown  Nov 05, 2012  May 17, 2013
Safari Books Online
percentage=42.66560818558702
5th of "Manually Working With Delimited Formats"

Following the text, using Python 2.7.3, I did: ================================================== class my_dialect(csv.Dialect): lineterminator = '\n' delimiter = ';' quotechar = '"' reader = csv.reader(f, dialect=my_dialect) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-290-0a8ad3677c01> in <module>() ----> 1 reader = csv.reader(f, dialect=my_dialect) TypeError: "quoting" must be an integer ================================================== The following code, which includes the required specification of the quoting field in the my_dialect subclass, gives no error: ================================================== class my_dialect(csv.Dialect): lineterminator = '\n' delimiter = ';' quotechar = '"' quoting = csv.QUOTE_MINIMAL reader = csv.reader(f, dialect=my_dialect) ==================================================

Note from the Author or Editor:
I think you're right. Editors, could you add the following line in the my_dialect class as indicated quoting = csv.QUOTE_MINIMAL please mind the 4 space indentation; this line should align with the quotechar, delimiter, etc. as listed

Ruchira S. Datta  Jun 06, 2013  Dec 12, 2014
Mobi
Page 1
On Kindle: "Location 325 of 13301"

Sorry, don't know the proper page number (I'm on a kindle), so I entered 1. In Chapter 1, under the numpy description, one of the bullet points has a minor grammatical error. It reads" "Tools for integrating connecting C, C + +, and Fortran code to Python" I assume "integrating connecting" was not intended as is.

Note from the Author or Editor:
on page 4 of the print text / PDF change "integrating connecting C, C++, ..." to "integrating C, C++, ..."

Anonymous  Oct 24, 2012  May 17, 2013
Printed, PDF, ePub
Page 6-8
Installation and Setup

Dear Sirs: I have just purchased Wes McKinney�s Python for Data Analysis. I am trying to install Python as instructed on pages 6-8 of the book, but I am running into problems. It appears that the Python package that comes with EDPFree and the Pandas library are both essential for me to use the book. When I try to install Pandas on top of EDPFree (which is now Canopy Express), I get the error message: �Python version 2.7 required, which was not found in the registry.� I am running Windows 7 (32-bit). The author recommends uninstalling the previous version of Python and then installing EPDFree, which has been changed to Enthought Canopy. After I do that, Python does not appear in Add or Remove Programs anymore, but Enthought Canopy does. The Canopy interface works, and it can run a simple script. It says that � contrary to the error message � I do have version 2.7 of Python installed. The author recommends installing pandas-0.9.0.win32-py2.7.exe. Only version 11 is now available, so I downloaded that. When I googled the error message, I got a suggestion to add C:\Python27; and C:\Python27\Scripts; to my system path, but that did not help. Google also gave me a suggestion to uninstall Python (which means Canopy in this case) for all users and re-install for just me. This also did not help. As things now stand, I do not think I will be able to make any use of the book. Is there a forum or an author�s page that addresses this problem? Thank you, John Chesnut

Note from the Author or Editor:
Since publishing the book Enthought have changed their Python distribution so that the directions are now incompatible. If you run into this problem please install the free Anaconda distribution for your platform (which includes pandas) from here: http://continuum.io/downloads

Anonymous  May 28, 2013  Dec 12, 2014
PDF
Page 9
2nd paragraph

In the OS X installation it states that we should type "gcc" at the terminal command line to see if gcc is installed. I'm running Mavericks and it is not installed. I believe it's been depreciated by Apple. Is there a workaround for this issue? Thanks

Note from the Author or Editor:
Yes, Mavericks now uses clang instead of gcc. Editors, could you add a parenthesis that states "(or clang on newer versions of OS X)"

scottclausen@mac.com  Oct 23, 2013  Dec 12, 2014
PDF
Page 18
India

the following command [json.loads(line) for line in open(path)] produces the following error: -------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-83-b1e0b494454a> in <module>() ----> 1 records = [json.loads(line) for line in open(path)] C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 336 parse_int is None and parse_float is None and 337 parse_constant is None and object_pairs_hook is None and not kw): --> 338 return _default_decoder.decode(s) 339 if cls is None: 340 cls = JSONDecoder C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\decoder.pyc in decode(self, s, _w) 363 364 """ --> 365 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 366 end = _w(s, end).end() 367 if end != len(s): C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\decoder.pyc in raw_decode(self, s, idx) 379 """ 380 try: --> 381 obj, end = self.scan_once(s, idx) 382 except StopIteration: 383 raise ValueError("No JSON object could be decoded") UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 6: invalid start byte Please help and explain the reason for the error

Note from the Author or Editor:
Editors, can you please change "open(path)" to "open(path, 'rb')" ? this will fix this issue for readers using Python 3

Mrinal  Jul 05, 2014  Dec 12, 2014
PDF
Page 23

For the code example following: In [301]: tz_counts[:10].plot(kind='barh', rot=0) The 'plot' function has no visible effect. Should be in iPython? (which also doesn't work.)

Note from the Author or Editor:
There should be a note at the beginning of the chapter to run IPython in pylab mode. Editors: please place a note at the end of the opening paragraph that says: "To follow along with these examples, you should run IPython in Pylab mode by running <literal>ipython --pylab</literal> at the command prompt."

Brian Piercy  Dec 04, 2012  May 17, 2013
Printed, PDF
Page 23
middle of page

In the PDF version, the url overshoots the page

Note from the Author or Editor:
Editors please insert a line break like so in the console output Out[304]: u'Mozilla/5.0 (Linux; U; Android 2.2.2; en-us; LG-P925/V10e Build/FRG83G) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'

Anonymous  Apr 18, 2013  May 17, 2013
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 24
two fifths down the page

Found same problem as CJ: 66 In the following line: operating_system = np.where(cframe['a'].str.contains('Windows'), 'Windows', 'Not Windows') np was not defined, so this line gives an error 99 Question: Why don't any of these known errata get confirmed/addressed by the author or staff at O'Reilly?

Note from the Author or Editor:
On page 21 please change the code line In [290]: about halfway down the page from In [290]: import pandas as pd to In [290]: import pandas as pd; import numpy as np This mistake is fairly minor (all things considered) as these code examples are intended to be run in IPython in "pylab" mode (ipython --pylab) which will have imported NumPy and created the np alias. Sorry about that

Moritz Heukamp  May 11, 2013  May 17, 2013
PDF
Page 29
2nd paragraph

totals should be titles: "This produced another DataFrame containing mean ratings with movie totals as row labels and gender as column labels. " should read "This produced another DataFrame containing mean ratings with movie titles as row labels and gender as column labels. "

Note from the Author or Editor:
Good catch. Editors, please make the indicated change. Thanks

vrajmohan  Sep 26, 2013  Dec 12, 2014
Printed
Page 33
middle

I get a ValueError: array dimensions must agree except for d_0 when I run line 371: names1880.groupby('sex').births.sum(). names1880.groupby('sex')['births'].sum() works.

Note from the Author or Editor:
We have addressed this (I believe) in a review of the code examples. Will follow up with editors to verify that it is fixed

Allen Long  Nov 03, 2013  Dec 12, 2014
PDF
Page 38
Code on bottom of page 38 and top of page 39

searchsorted() is a method available for NumPy arrays, not Pandas Series. So to get the code in the book to work, I needed to first convert the Series to a NumPy array with array(). In final code, the get_quantile_count() function is as follows: # Get number of distinct names in the top 50% of births using clever NumPy hack def get_quantile_count(group, q=0.5): group = group.sort_index(by='prop', ascending=False) return array(group.prop.cumsum()).searchsorted(q) + 1

Note from the Author or Editor:
Ah, this is a casualty of some API changes in pandas: Editors, could you change the indicated line to be instead: group.prop.cumsum().values.searchsorted(q) + 1

Todd Leonhardt  Sep 14, 2013  Dec 12, 2014
Printed
Page 38
United States

After defining the array prop_cumsum you want to call the method searchsorted to search for the 50th percentile. The code supplied is prop_cumsum.searchsorted(0.5), which throws the error Series object has no Attribute searchsorted I got this to work sort of: numpy.searchsorted(prop_cumsum,0.5), the only problem is the output is every line number in the array followed by the index position. Can you shed any light on the code as written in the text and the code I got to work? Thanks

Note from the Author or Editor:
This is caused by API changes in pandas. We have fixed the code example in an overall review of the examples, so this will be addressed in the next printing.

Anonymous  Jun 25, 2014  Dec 12, 2014
PDF
Page 40
in [3]

While executing the code from the book: In [3]: data = {i : randn() for i in range(7)} I got the following error: NameError: global name 'randn' is not defined. I solved it by using "from scipy import randn". (Perhaps included packages depend on ipython configuration.)

Note from the Author or Editor:
Page 46 in the printed text, please insert the line In [541]: import numpy as np right above the In [542]: ... and make sure there is a blank line for consistent formatting

Anonymous  Aug 15, 2012  May 17, 2013
PDF
Page 43
United States

filename m1-1m /users.dat should be movielens/users.dat

Note from the Author or Editor:
Correct -- editors, could you make the indicated change (replace ml-1m with movielens)?

Anonymous  Dec 07, 2013  Dec 12, 2014
ePub
Page 46
printed text,

Code from Safari: In [541]: import numpy as np In [542]: data = {i : randn() for i in range(7)} This causes an error: NameError: global name 'randn' is not defined This works data = {i : np.random.randn() for i in range(7)} Appears there is a problem with the 'import numpy as np' being incomplete.

Note from the Author or Editor:
Good catch, and I believe we tried to correct this error in the last revision. Editors, could you replace the indicated randn with np.random.randn ? thanks

Anonymous  Jun 24, 2013  Dec 12, 2014
PDF
Page 52
top

the two ways of computing top1000 give different results

Note from the Author or Editor:
I have made a note to look into this since we have made a full review of the book's code examples. There might be a bug in pandas, in which case I will report upstream to the dev team

Anonymous  Dec 07, 2013  Dec 12, 2014
PDF
Page 53
Table 3-1

Commands are given as 'Ctrl-P', 'CTRL-A', etc. with the letter in UPPERCASE, which is potentially confusing, since the keys are to be pressed without the shift key (except 'Ctrl-Shift-v'). In fact, without the example containing a 'Shift', I would not be sure this is an error.

Note from the Author or Editor:
A fair point. Editors: Please change the single letters in the command shortcuts in Table 3-1 to lowercase. E.g. Ctrl-Shift-V should be Ctrl-Shift-v and Ctrl-B should be Ctrl-b Thanks

Steven Pav  Dec 27, 2012  May 17, 2013
Printed
Page 54
2nd paragraph

... designed to faciliate common tasks ...

Note from the Author or Editor:
Please fix facilitate typo

Frans Koning  Nov 22, 2012  May 17, 2013
PDF
Page 54
Code example at bottom of page

When I try to do 'a' in _ip.user_ns it throws a NameError exception and says "name '_ip' is not defined. I can use the IPython magic %who to see if the variable is in memory or not.

Note from the Author or Editor:
I should have known better than to use a private IPython API. editors, could we remove this altogether: In [8]: 'a' in _ip.user_ns Out[8]: True change the line number of the subsequent prompt to 8 (instead of 9) then, remove the following lines: In [1]: 'a' in _ip.user_ns Out[1]: False and add these lines in its place: In [10]: a --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-10-60b725f10c9c> in <module>() ----> 1 a NameError: name 'a' is not defined thanks

Todd Leonhardt  Sep 15, 2013  Dec 12, 2014
Printed
Page 65
Paragraph 1

what is referred to as Table 3-3 in the text is actually displayed as Table 3-4

Note from the Author or Editor:
Confirmed. Please fix reference to Table 3-4

Anonymous  Apr 18, 2013  May 17, 2013
Printed
Page 67
Last sentence of third paragraph

Text reads "Here is a simple list of 700,000 strings ..." but the sample code produces 600,000 strings.

Note from the Author or Editor:
Good catch. Editors, could you change the copy to say 600,000 instead of 700,000?

James Williamson  May 26, 2013  Dec 12, 2014
Printed
Page 69
Paragraph 4, last sentence

'while' should be 'whole'

Note from the Author or Editor:
Confirmed, thanks

Anonymous  Apr 18, 2013  May 17, 2013
Printed
Page 75
paragraph 2, sentence 2

'willl' should be 'will'

Note from the Author or Editor:
Confirmed. thanks

Anonymous  Apr 18, 2013  May 17, 2013
Printed
Page 77
Top bullet points

The third bullet point in the sample configuration changes is unnecessary: it repeats the first clause of the second bullet point.

Note from the Author or Editor:
good catch. Editors, could you remove the 3rd bullet point?

None  May 26, 2013  Dec 12, 2014
Printed
Page 83
Last line in table 4-2 on this page

"float64, float128" should read "float64" only. "float128" already correctly appears on the next line in the table (on page 84).

Note from the Author or Editor:
Correct. Please delete the ", float128" there

Dan Grossman  Jan 25, 2013  May 17, 2013
Printed
Page 86
Final paragraph, first sentence.

"... especially if they have used ..." should read "... especially if you have used ..."

Note from the Author or Editor:
Thanks, please correct typo as described

Dan Grossman  Jan 25, 2013  May 17, 2013
PDF
Page 89
In [84]:

As randn is a function in the numpy.random module, the line should read: data = np.random.randn(7, 4)

Note from the Author or Editor:
yes: editors, please make the indicated change

vrajmohan  Sep 17, 2013  Dec 12, 2014
Printed
Page 90
paragraph 1, sentence 2

par 1, sentence 2 is a fragment

Note from the Author or Editor:
Change the first two sentences of that paragraph to Suppose each name corresponds to a row in the <literal>data</literal> array, and we wanted to select all the rows with corresponding name <literal>'Bob'</literal>.

Anonymous  Apr 18, 2013  May 17, 2013
Printed
Page 95
In [123]: and In [124]:

As in "In [84]:" on page 89, `randn()' should read `np.random.randn()' ...

Note from the Author or Editor:
Editors: can you please make the indicated change? Replace randn() with np.random.randn()

Kazuyoshi Furutaka  Jun 11, 2014  Dec 12, 2014
Printed, PDF
Page 99
Second to last paragraph

"scalers" should be "scalars"

Wes McKinney
Wes McKinney
O'Reilly Author 
May 13, 2013  May 17, 2013
Printed, PDF
Page 100
United States

1 * cond1 + 2 * cond2 + 3 * -(cond1 | cond2) is not equivalent to the two other code examples offered. In particular, if cond1 and cond2 are both False, the result is 0, not 3.

Note from the Author or Editor:
Oops. Please change that line of code to 1 * (cond1 & -cond2) + 2 * (cond2 & -cond1) + 3 * -(cond1 | cond2)

Aaron Schumacher  Apr 07, 2013  May 17, 2013
Printed, PDF
Page 106
Table 4-7

For pinv description remove the word "square" (this function does not require that the matrices be square)

Wes McKinney
Wes McKinney
O'Reilly Author 
May 13, 2013  May 17, 2013
Printed, PDF
Page 106
Table 4-7

In description of lstsq, replace "y = Xb" with the more commonly used "Ax = b"

Wes McKinney
Wes McKinney
O'Reilly Author 
May 13, 2013  May 17, 2013
Page 107
Table 4-8

Table 4-8: the description for binomial should read 'Draw samples from a binomial distribution'

Note from the Author or Editor:
Please fix as described. thanks!

Anonymous  Apr 18, 2013  May 17, 2013
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 107
Middle of page

Change "See table Table 4-8..." to "See Table 4-8..."

Wes McKinney
Wes McKinney
O'Reilly Author 
May 12, 2013  May 17, 2013
PDF
Page 119
Table 5-5

The description of the copy option for reindex in table 5-5 of the current (as of 8/2/12) preprint version may be wrong. It says that copy is "Do not copy underlying data if new index is equivalent to old index." I believe this is the opposite of copy's behavior, and the words "Do not" should be removed.

Note from the Author or Editor:
Change text to If True, always copy underlying data even if new index is equivalent to old index. Otherwise, do not copy the data when the indexes are equivalent.

Dan Becker  Aug 02, 2012  May 17, 2013
PDF
Page 123
Table 5-6, 2nd row

"Selects single row of subset of rows from the DataFrame." shoud probably be "Selects single row or subset of rows from the DataFrame."

Note from the Author or Editor:
Confirmed typo as described

Guan Yang  Aug 16, 2012  May 17, 2013
Printed
Page 124
table 5.5

Description for argument copy is self contradictory. Appears to say copy true means don't copy

Note from the Author or Editor:
The text could be clearer. Editors, could you change "Otherwise" to read "If False" (use fixed width font for the False) in the table?

None  Jul 03, 2013  Dec 12, 2014
Printed
Page 125
Last sentence

last sentence: should read 'Here are some examples of this:'

Note from the Author or Editor:
please fix as described. thanks!

Anonymous  Apr 18, 2013  May 17, 2013
Printed, PDF
Page 152
Final code block

The line currently is: frame = DataFrame(np.arange(6).reshape(3, 2)), index=[2, 0, 1]) It should instead be: frame = DataFrame(np.arange(6).reshape(3, 2), index=[2, 0, 1])

Note from the Author or Editor:
Confirmed. please change as described

Joshua Lande  Mar 14, 2013  May 17, 2013
Printed
Page 152
Second paragraph

Duplicate colons introduce the second example code block.

Note from the Author or Editor:
Please remove the unnecessary colon

None  Jun 07, 2013  Dec 12, 2014
Printed
Page 152
Middle

For line [294] of the iget_value code example, the second ")" after the call to reshape(3, 2) is incorrect.

Note from the Author or Editor:
I believe this is already fixed in the second printing

None  Jun 07, 2013  Dec 12, 2014
Printed
Page 153
bottom of page

pdata.ix['Adj Close', '5/22/2012':, :] refers to Adj Close. The table below that shows the Close, not the Adj Close.

Note from the Author or Editor:
Very strange. Editors, can you please change the indicated line of code to: pdata.ix['Adj Close', '5/22/2012':, :] See also revised code examples for an alternative replacement.

Arie Ellerbrak  Aug 01, 2013  Dec 12, 2014
PDF
Page 160
United States

keep_date_col description is inconsistent with the pandas documention. Should be: If joining columns to parse date, keep the joined columns. Default False

Note from the Author or Editor:
Confirmed. Please change as described

Thomas Maloney  Jan 04, 2013  May 17, 2013
Printed
Page 162
Middle op the page

In order for data.to_csv(sys.stdout, sep='|') to work you must import sys first

Note from the Author or Editor:
Editors, find this text on the page (writing to sys.stdout so it just prints the text result) change it to (writing to sys.stdout so it just prints the text result; make sure to import sys) use fixed width font for "import sys"

Arie Ellerbrak  Aug 01, 2013  Dec 12, 2014
PDF
Page 170
Middle

The Output of perf = DataFrame(data) is not correct. As printed: In [928]: perf Out[928]: Empty DataFrame Columns: array([], dtype=int64) Index: array([], dtype=int64) But should be: <class 'pandas.core.frame.DataFrame'> Int64Index: 648 entries, 0 to 647 Data columns: AGENCY_NAME 648 non-null values CATEGORY 648 non-null values DESCRIPTION 648 non-null values FREQUENCY 648 non-null values INDICATOR_NAME 648 non-null values INDICATOR_UNIT 648 non-null values MONTHLY_ACTUAL 648 non-null values MONTHLY_TARGET 648 non-null values PERIOD_MONTH 648 non-null values PERIOD_YEAR 648 non-null values YTD_ACTUAL 648 non-null values YTD_TARGET 648 non-null values dtypes: int64(2), object(10)

Note from the Author or Editor:
Confirmed. Please change the text of Out[928]: to <class 'pandas.core.frame.DataFrame'> Int64Index: 648 entries, 0 to 647 Data columns: AGENCY_NAME 648 non-null values CATEGORY 648 non-null values DESCRIPTION 648 non-null values FREQUENCY 648 non-null values INDICATOR_NAME 648 non-null values INDICATOR_UNIT 648 non-null values MONTHLY_ACTUAL 648 non-null values MONTHLY_TARGET 648 non-null values PERIOD_MONTH 648 non-null values PERIOD_YEAR 648 non-null values YTD_ACTUAL 648 non-null values YTD_TARGET 648 non-null values dtypes: int64(2), object(10)

Thomas Maloney  Jan 04, 2013  May 17, 2013
Printed
Page 172
Last paragraph, 2nd sentence

Interally -> Internally

Note from the Author or Editor:
Confirmed typo

Arie Ellerbrak  Aug 02, 2013  Dec 12, 2014
Printed
Page 175
top

Due to change to SQLAlchemy the conn object is replaced by an engine object. The line, conn = sqlite3.connect(':memory:') should be replaced by To use a SQLite :memory: database, specify an empty URL: engine = create_engine('sqlite://') Notice that 'sqlite' is in lowercase and without a '3' suffix. For a relative file path, this requires three slashes: engine = create_engine('sqlite:///foo.db') And for an absolute file path, four slashes are used: engine = create_engine('sqlite:////absolute/path/to/foo.db') source: http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#sqlite

Note from the Author or Editor:
Editors: We are addressing this in the code example review. Reporter: This will be fixed in the next printing

Jim Callahan  Jul 31, 2014  Dec 12, 2014
Printed
Page 175
United States

Current text "...pandas has a read_frame function in its pandas.io.sql module that simplifies the process." Warnings when running code: 1. "read_frame is depreciated, use read_sql " 2. "Reading a table with read_sql is not supported" "for a DBIAPI2 connection. Use a SQLAlchemy" "engine or specify a SQL query" This apparently changed with pandas release v0.14.0 (May 31 , 2014). Essentially the SQL function names change and the engine object replaces the connection object. The SQL changes are documented in: http://pandas.pydata.org/pandas-docs/stable/pandas.pdf page 8 "SQL interfaces updated to use sqlalchemy, " page 18 "The SQL reading and writing functions now support more database flavors through SQLAlchemy... The new functions read_sql_query() and read_sql_table() are introduced. The function read_sql() is kept as a convenience wrapper around the other two and will delegate to specific function depending on the provided input (database table name or sql query). In practice, you have to provide a SQLAlchemy engine to the sql functions. To connect with SQLAlchemy you use the create_engine() function to create an engine object from database URI. You only need to create the engine once per database you are connecting to. For an in-memory sqlite database: In [43]: from sqlalchemy import create_engine # Create your connection. In [44]: engine = create_engine(�sqlite:///:memory:�) This engine can then be used to write or read data to/from this database: In [45]: df = pd.DataFrame({�A�: [1,2,3], �B�: [�a�, �b�, �c�]}) In [46]: df.to_sql(�db_table�, engine, index=False) You can read data from a database by specifying the table name: In [47]: pd.read_sql_table(�db_table�, engine) Out[47]: A B 0 1 a 1 2 b 2 3 c or by specifying a sql query: In [48]: pd.read_sql_query(�SELECT * FROM db_table�, engine) Out[48]: A B 0 1 a 1 2 b 2 3 c"

Note from the Author or Editor:
We are fixing this in the code example review. Will be fixed in next printing

Jim Callahan  Jul 31, 2014  Dec 12, 2014
PDF, ePub, Mobi
Page 192
Beginning of section Pivoting long to wide Format

The section begins: A common way to store multiple time series in databases and CSV is in so-called long or stacked format: In [116]: ldata[:10] However, the variable ldata has not been defined or initialized previously (or later) in the book.

Note from the Author or Editor:
Yeah, I left the code to make that DataFrame out as it was derived in a mungy way from the macrodata used earlier. Editors: please put a note in parentheses after "stacked format" that says "... or stacked format (code to create this DataFrame omitted for brevity):" or something. pretty trivial for the user to type this in

David Kimery  Apr 17, 2013  May 17, 2013
PDF, ePub
Page 192
out 116 and out 118

In chapter 7, in the subsection entitled "Pivoting "long" to "wide" Format" . . . On further examination -- the ldata output in out 116 is only for part of ldata, as in ldata[:10]. This omits five rows of data that should be in ldata based on the rest of the examples in this section: 10 1959-12-31 00:00:00 infl 0.270 11 1959-12-31 00:00:00 unemp 5.600 12 1960-03-31 00:00:00 realgdp 2847.699 13 1960-03-31 00:00:00 infl 2.310 14 1960-03-31 00:00:00 unemp 5.200

Note from the Author or Editor:
I need to look into this, but I am going to try to add the code to generate the ldata table. I replied to your other question, but I didn't realize until further examination that the code was omitted. I made a note to myself and will address separately with the editors

Doug McCaleb  Aug 15, 2013  Dec 12, 2014
Printed
Page 192
Belgique

A reader posted earlier the following comment: "The section begins: A common way to store multiple time series in databases and CSV is in so-called long or stacked format: In [116]: ldata[:10] However, the variable ldata has not been defined or initialized previously (or later) in the book. " Perhaps would it be helpful to slightly alter the example to make it immediately testable by the audience of the book: from pandas.core.reshape import melt, pivot df = pd.read_csv('ch07/macrodata.csv') # original format data = df.ix[:,['year', 'quarter', 'realgdp', 'infl', 'unemp']] # selection of variables data['date'] = 10*data['year']+data['quarter'] # some quick identificator for the 'date' instead of separate year and quarter variables del data['year'] del data['quarter'] ldata = melt(data, id_vars = ['date']) # long format pivoted = ldata.pivot('date', 'variable', 'value'); pivoted.head() # Note: 'item' becomes 'variable' in the rest of the example

Note from the Author or Editor:
OK, sounds good. Editors, could you remove this text: (code to create this DataFrame omitted for brevity) then, after the first code example (ldata[:10]), could you put a code block with this code used to create the example: data = pd.read_csv('ch07/macrodata.csv') periods = pd.PeriodIndex(year=data.year, quarter=data.quarter, name='date') data = DataFrame(data.to_records(), columns=pd.Index(['realgdp', 'infl', 'unemp'], name='item'), index=periods.to_timestamp('D', 'end')) ldata = data.stack().reset_index().rename(columns={0: 'value'})

Patrick Jeuniaux  Oct 14, 2013  Dec 12, 2014
PDF
Page 194
3rd paragraph under "Removing Duplicates"

"Relatedly, drop_duplicates returns a DataFrame where the duplicated array is True:" The index values from `data.drop_duplicates()` suggest that drop_duplicates returns rows where the duplicated() array is False.

Note from the Author or Editor:
Nice catch, will fix in the upcoming printing.

Chapman  Nov 17, 2014  Dec 12, 2014
Printed
Page 199
Top of page.

The bins are divided into 18 to 25, 26 to 35, 35 to 60 and 60 and older. Should be 18 to 26, 26 to 35, 35 to 60, 60 and older or 18 to 25, 25 to 35, 35 to 60, 60 and older.

Note from the Author or Editor:
editors, can you please change the copy to: 18 to 25, 26 to 35, 36 to 60, and finally 61 and older

Arie Ellerbrak  Aug 02, 2013  Dec 12, 2014
PDF
Page 204
somewhere

ch07/movies.dat is not there (is in ch02/movielens)

Note from the Author or Editor:
Thanks. please change 'ch07/movies.dat' to 'ch02/movielens/movies.dat' in the code

Miki Tebeka  Nov 09, 2012  May 17, 2013
Page 223
Table 8-1

Table 8-1: the description for 'subplot_kw' is cut off

Note from the Author or Editor:
Please change the description for subplot_kw to Dict of keywords passed to <literal>add_subplot</literal> call used to create each subplot.

Anonymous  Apr 18, 2013  May 17, 2013
Page 235
paragraph1, sentence 1

par 1 sentence 1: should read '... is as simple as ...'

Note from the Author or Editor:
Please fix typo as described. thanks!

Anonymous  Apr 18, 2013  May 17, 2013
PDF
Page 241
somewhere

scatter_matrix(trans_data, diagonal='kde', color='k', alpha=0.3) should be pd.scatter_matrix(trans_data, diagonal='kde', color='k', alpha=0.3)

Note from the Author or Editor:
Thanks. Please change code as described (add pd. to start of statement)

Miki Tebeka  Nov 09, 2012  May 17, 2013
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 241-242
Fig 8-23

Fig 8-23 appears to be identical to Fig 8-22

Note from the Author or Editor:
Not sure what happened here, 8-23 is supposed to be a different figure if you read the text closely. Here is a figure to replace 8-23 (should just be a drop-in replacement), editors please contact me if you need any changes to this: https://www.dropbox.com/s/annqtoank0snrwu/scatter_matrix_fix_20130512.pdf

Anonymous  Apr 18, 2013  May 17, 2013
PDF
Page 246
Example code

The example code on the page 246 (Plotting Maps: Visualizing Haiti Earthquake Crisis Data) no longer works due to change of pandas since v0.13.0 released on 31 Dec 2013. To make it work, x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE) should be x, y = m(cat_data.LONGITUDE.values, cat_data.LATITUDE.values) You may find details on http://stackoverflow.com/questions/23136159 Apart from this, it will be also great if we add the following line at the end of the same example code to show the resulting plot. plt.show()

Note from the Author or Editor:
Editors: please verify that this has been fixed in the overall code example review.

Younghoon Rhiu  Jun 21, 2014  Dec 12, 2014
Printed
Page 266
Top half

demeaned.groupby(key).mean() does not work for me; that is, it yields non-zero values (and not just due to rounding). I think the issue is that the people DataFrame gets reorganized internally with rows in different order. This doesn't seem to affect the alignment of key within people. But it does affect demean, so the values of key no longer line up with their original position. import pandas as pd from pandas import DataFrame import numpy as np def demean(arr): return arr - arr.mean() # This doesn't work. people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis']) key = ['one', 'two', 'one', 'two', 'one'] demeaned = people.groupby(key).transform(demean) print demeaned print demeaned.groupby(key).mean() produces a b c d e Jim 0.223861 -2.072542 0.973977 -0.021754 -1.019689 Joe 0.326119 0.671576 0.487932 -0.404353 1.219755 Steve -0.223861 2.072542 -0.973977 0.021754 1.019689 Travis 0.204880 -0.422467 -1.024938 -0.555061 -0.563228 Wes -0.530999 -0.249109 0.537006 0.959414 -0.656527 a b c d e one -0.177000 -0.083036 0.179002 0.319805 -0.218842 two 0.265499 0.124555 -0.268503 -0.479707 0.328264

Note from the Author or Editor:
This appears to be a bug in pandas unfortunately. I have reported it to the dev team here -- the appropriate action here is to fix the bug rather than changing the book text: https://github.com/pydata/pandas/issues/8046

Ian Gow  Jul 06, 2013  Dec 12, 2014
Printed
Page 266
top half

This is reference to an issue that Ian Gow has also pointed about above (Jul 06, 2013). A possible solution to the problem is mentioned below. Define people as in the book. The values are a different since 'randn' gives different numbers. >>> people a b c d e joe 2.011219 0.139871 -0.169945 1.801018 0.560313 steve -0.878164 0.121969 -0.174672 -1.500867 1.548067 wes -0.460175 -0.449552 1.213917 1.250151 0.191200 jim 2.286116 -1.253508 -0.567102 -0.802946 1.432807 travis -0.506323 0.807026 0.960450 -1.266392 0.567154 Define key as in the book: >>> key ['one', 'two', 'one', 'two', 'one'] However, the error is that the following does not give zero mean: demeaned = people.groupby(mapc,axis=0).transform(demean) demeaned.groupby(mapc,axis=0).mean() >>> demeaned = p.groupby(key).transform(demean) >>> demeaned.groupby(key).mean() a b c d e one -0.269472 -0.205111 0.181926 0.218409 -0.082785 two 0.404208 0.307667 -0.272888 -0.327613 0.124178 A possible solution is to do the following. Define mapc as: mapc = {'joe':'one', 'steve':'two', 'wes':'one', 'jim':'two', 'travis':'one'} and now the the following produces zero mean: >>> demeaned = p.groupby(mapc).transform(demean) >>> demeaned.groupby(mapc).mean() a b c d e one 7.401487e-17 0 3.700743e-17 3.700743e-17 -4.625929e-17 two 0.000000e+00 0 -1.387779e-17 5.551115e-17 0.000000e+00

Note from the Author or Editor:
We are working to address this in pandas: https://github.com/pydata/pandas/issues/8046

Qasim Iqbal  Oct 25, 2013  Dec 12, 2014
Printed, PDF, ePub
Page 271
bottom

This statement from shapelib import ShapeFile asks the shapelib library. I tried to install shapelib and pyshapelib (the binding), but it gave an error shapelibc.so: undefined symbol: SASetupDefaultHooks Judging from the fact that pyshapelib was last updated in 2007, we are wondering if it is still compatible with newer version of shapelib. Could you recommend another shapelib binding that will work with the examples of the book?

Note from the Author or Editor:
We may need to remove this example; I know there are various issues with basemap as well. I've made a note and I will follow up with O'Reilly editors

Anonymous  Sep 09, 2013  Dec 12, 2014
PDF
Page 282
somewhere

Should be return totals.order(ascending=False)[:n] (was [-n:])

Note from the Author or Editor:
Correct. Please fix code typo as described (replace [-n:] with [:n])

Miki Tebeka  Nov 09, 2012  May 17, 2013
Printed
Page 308
middle of page

Out[470] should be 'Period('2007-06', 'M')'

Note from the Author or Editor:
Confirmed, please make change as described There is also a formatting mistake right before "Out [470]:" , please fix that also

Anonymous  Apr 18, 2013  May 17, 2013
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 324
bottom of page

In[570]: spx_px is has not been defined in the chapter yet

Note from the Author or Editor:
Please add code line just above In [570]: In [569]: spx_px = close_px_all['SPX'] Make sure there is a blank line between that code line and the next one to keep the styling consistent

Anonymous  Apr 18, 2013  May 17, 2013
Printed
Page 324
First paragraph of Exponentially-weighted functions

The formula for the moving average is written as ma_t = a * ma_{t-1} + (a-1) * x_{-t} with a the decay factor. It should be: ma_t = a * ma_{t-1} + (1-a) * x_{t}

Note from the Author or Editor:
Good catch, please make this change

Bertrand Haut  Mar 06, 2014  Dec 12, 2014
Printed
Page 344
1st paragraph, body of the "to_index" function

The given defintion of to_index: def to_index(rets): index = (1 + rets).cumprod() first_loc = max(index.notnull().argmax() - 1, 0) index.values[first_loc] = 1 return index doesn't seem to work with Pandas 0.14.1, firstly due to "index.notnull().argmax() - 1", where index.notnull().argmax() is now a Timestamp without an offset, from which one can't substract an int. Morever, one can't compare it against an int, as part of the max() function. The following version works: def to_index(rets): index = (1 + rets).cumprod() first_loc = index.notnull().argmax() index[first_loc] = 1 return index

Note from the Author or Editor:
Good catch will fix in the upcoming printing.

David Garcia Quintas  Oct 04, 2014  Dec 12, 2014
PDF
Page 345
Signal Frontier Analysis section

The example refers to a mean reverting strategy and not a momentum portfolio because we rank returns in descending order. E.g. the highest return gets the rank 1, which translates in a lower portfolio weight after demeaning and normalizing. So either we change the text or, if we really want to provide an example of momentum portfolio we change the function calc_mon and use ascending=True, i.e. ranks = mom_ret.rank(axis=1, ascending=True) There is another small error in function strat_sr on page 346. Here when we compute the portfolio we use a lag value of 1, meaning that for portfolio at day t we use only information from day t-1 back. This is ok, however, when we then compute the total cumulative returns there is no need to again shift the portfolio by one day, as this implies that we just through away one day of information, so the line: port = port.shift(1).resample(freq, how='first') should be: port = port.resample(freq, how='first')

Note from the Author or Editor:
You're right about the momentum portfolio. Editors, on page 345 can you replace the two usages of "momentum" with "mean reversion" and on Page 347, in the Figure 11-3 caption can you also make the same substitution. The second note about the strat_sr function is not errata because the portfolio weights are the portfolio weights: they have to be shifted forward to compute the portfolio returns in the next period, so no changes needed there.

Anonymous  Jul 01, 2014  Dec 12, 2014
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 358
In Figure 12-3

arr.reshape((3,4), order=?) should read arr.reshape((4,3), order=?)

Note from the Author or Editor:
Correct, please fix figure text as described. Surprised this one evaded me but it's obvious once you see it =)

Dan Grossman  Jan 25, 2013  May 17, 2013
Printed
Page 363
Bottom of page

In box, The Broadcasting Ru should be The Broadcasting Rule

Wes McKinney
Wes McKinney
O'Reilly Author 
May 12, 2013  May 17, 2013
PDF
Page 365
image

Quote from page 364: "See Figure 12-6 for another illustration, this time subtracting a two-dimensional array from a three-dimensional one across axis 0." Figure 12-6 does not show subtraction nor numbers representing numpy data make any sense

Note from the Author or Editor:
The figure and text needs fixing The text: change "subtracting... from ..." to "adding...to..." In the Figure 12-6, change the numbers in the result to be double what they are, so instead of 0, 1, 2, 3, 4, 5, 6, 7, make then in the corresponding order double that, 0, 2, 4, 6, ...

klo  Oct 31, 2012  May 17, 2013
PDF
Page 390
Next to paw prints at the top

"Assignment is also referred to as binding, as we are binding a name to an object. Variables names that have been assigned may occasionally be referred to as bound variables." At the beginning of the second sentence, I think either 'variables' should be singular or the word 'names' should be removed. :-)

Note from the Author or Editor:
Editors: on Page 390, "Variables names" should be "Variable names"

Nick Carchedi  Jun 05, 2014  Dec 12, 2014
Printed
Page 400
middle of page

The text currently says: "When aggregating of otherwise grouping time series data, ..." It probably should say "When aggregating or otherwise grouping time series data"

Note from the Author or Editor:
Please fix typo as described, thanks

Anonymous  Apr 15, 2013  May 17, 2013
Printed
Page 405
first snippet in page

The code snippet about the "xrange" function needs correction. Replace "x" with "i" in the following example: sum = 0 for i in xrange(10000): # % is the modulo operator: if x % 3 == 0 or x % 5 == 0: sum += i The right code should be: sum = 0 for i in xrange(10000): # % is the modulo operator: if i % 3 == 0 or i % 5 == 0: sum += i

Note from the Author or Editor:
Good catch. Editors, please change "x" to "i" in the indicated code example as written by the errata reporter

Gaston  Apr 15, 2014  Dec 12, 2014
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 418
last line

IT IS: loc_mapping = dict((val, idx) for idx, val in enumerate(strings)} SHOULD BE: loc_mapping = dict((val, idx) for idx, val in enumerate(strings)) NOTE: Last character of code line should be ) and not }... probably from wrong copy&paster of previous code line. It's obvious, but I checked this with IPython.

Note from the Author or Editor:
Please fix typo as submitter described (replace curly brace with parenthesis) Thanks!

Jose Manuel Martí  May 09, 2013  May 17, 2013
PDF
Page 420
Bottom third

The main restriction on function arguments it that the keyword arguments must follow the positional arguments (if any). 'it' should be 'is'

Note from the Author or Editor:
Editors: please change to "The main restriction on function arguments is that"

Nick Carchedi  Jun 06, 2014  Dec 12, 2014
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 427
Last code example in section "Currying: Partial Argument Application"

In the code comment: # Take the 60-day moving average of of all time series in data "of" is repeated.

Note from the Author or Editor:
Please fix typo as described (remove duplicate "of")

Jose Manuel Martí  May 09, 2013  May 17, 2013
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 432
Last line in Table A-6

IS: True is the file is closed. SHOULD BE: True if the file is closed.

Note from the Author or Editor:
Please make change as submitter described (replace is with if)

Jose Manuel Martí  May 10, 2013  May 17, 2013
ePub
Page 712
1st code example, list comprehension for enough_es within for loop

In the first code example for the Nest list comprehensions section, the "if name.count('e') > 2" within the list comprehension should have a ">=" instead of a ">".

Note from the Author or Editor:
You're right. Editors, could you please make the indicated change?

Todd Leonhardt  Sep 14, 2013  Dec 12, 2014
ePub
Page 727
Top of page, 1st code example

For the output to work as intended in the example, the print statement within def squares() needs to be outside the for loop within that generator function. The way the code is written, the 'Generating squares....' print will occur each time a new number is generated. But if you move the print outside the for, it will print exactly once.

Note from the Author or Editor:
Good catch. Authors could you change the code cited to look like this (mind the 4-space indents): def squares(n=10): print 'Generating squares from 1 to %d' % (n ** 2) for i in xrange(1, n + 1): yield i ** 2

Todd Leonhardt  Sep 14, 2013  Dec 12, 2014