Data Analysis with Open Source Tools

Errata for Data Analysis with Open Source Tools

Submit your own errata for this product.


The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.


Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted
Printed Page 21
in the "Kernel Density Estimates" chapter

In p21 of my copy, in the "Kernel Density Estimates" chapter of the book: "[...] a formula for the KDE with bandwidth h [...]" D_h(x; {x_i}) = sum_{i=1}^{n} (1/h)K((x - x_i)/h) I think that (1/n) is missing before the sum, so that the KDE itself is normalized.

Francois Berenger  Feb 14, 2019 
Printed Page 41
kde code listing

The data file ch02_presidents used for the sample code needs to have the space in Van Buren removed....extra embedded space throws off column reading in loadtxt

Steven Heller  Aug 04, 2013 
Printed Page 45
First line of the second block of code at the top of the page

The comment is incorrect, the code prints the second and third rows, not columns.

Jerome Boisvert-Chouinard  Jun 09, 2015 
Printed Page 52
Graph at top of page

The graph is incorrect, it was generated with erroneous code (there is an error in the gaussian kernel calculation - see my submitted errata for page 73).

Jerome Boisvert-Chouinard  Jun 09, 2015 
Printed Page 73
Code snippet at the bottom of the page

The code for the LOESS function contains an error. The line: w = exp ( -0.5*( ((x-xp/h))**2)/sqrt(2*pi*h**2) ) should be w = exp ( -0.5*( ((x-xp/h))**2) ) / sqrt(2*pi*h**2) (The sqrt of 2 pi factor should not be IN the exponent. See for example page 20 where the gaussian kernel is defined.)

Jerome Boisvert-Chouinard  Jun 09, 2015 
PDF Page 97
3rd line from the bottom

In the call to the correlate function, the longer array needs to be first and the shorter one second. The resulting array is 201 elements so I am not clear why you need the last line on page 97?

Phillip Wilmarth  Feb 19, 2013 
PDF Page 101
Figure 5-3

Image of figure 5-3, false-color plot does not appear on my iPod Touch (but does appear when viewing in Preview application). The front and back cover images also do not appear on my iPod Touch.

Note from the Author or Editor:
Can't confirm or deny - seems to be a technical problem, not a content erratum.

Doug Raines  Nov 18, 2010 
Printed Page 106
Last line

The sentence as is: "... the individual subplots become too small as that we could still recognize anything useful..." Should be something line "...become too small to recognize anything useful..."

Yuliya Gorlina  Nov 10, 2011 
Printed Page 127
first and only footnote

The StatLib URL http://lib.stat.cmu.edu/datasets/visualizing.data.zip to the zip file containing the Mauna Loa dataset does not appear to be valid any longer.

Josef Assad  May 27, 2016 
PDF Page 253
2nd paragraph, 4th line

The motivating example for naively calculating the average of averages reads: "Just adding up the individual defect rates per item and dividing by 3 (in effect, averaging them) did not seem right—if only because it would come out to about 0.75" The proper value for this calculation should be "about 0.5": $ python -c'print (0.5+1.0+0.01)/3' 0.503333333333 The 0.75 figure seems to be a miscalculation of dividing by 2: $ python -c'print (0.5+1.0+0.01)/2' 0.755

Robert Iwatt  Jun 15, 2013 
Printed Page 301-302
top of 301, fourth paragraph on 302

The definition of dot product is very non-standard. In my opinion that should be pointed out. Also, I don't understand why the value of a dot product must be in [0,1]. Is there an assumption that all x_i y_i are positive?

Yuliya Gorlina  Nov 13, 2011