Errata

Think Stats

Errata for Think Stats

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page 4
Fifth paragraph

The example in the previous paragraph lists the variable 'pregordr', however the variable is spelled incorrectly in the following paragraph:
"pregorder is a one-byte integer..."
Note the extra "e".

Scott Breitbach  Sep 13, 2020 
PDF Page 40
5th paragram, in code sample,

in PDF ver 1.6, P40, the following line won't work. In existing maypole.py (by 3/19/2015), xscale and yscale have never been set to pyplot. The alternative way is to call myplot.Config(xscale='log', ysacle='log').

myplot.Cdf(cdf, complement=True, xscale='linear', yscale='log')

Bo Dong  Mar 20, 2015 
PDF Page 41
last sentence of the 1st paragrah

It says: By contrast, the exponential distribution with median 1 has 95th per- centile of only 1.5.

I think the percentile should be around 4.3. Please confirm, thanks.
For exponential distribution, median = ln(2)/lam, so when median=1 lam is ln(2). The output of the following code snippet is :
expo mean is 1.000050, percentile 95% is 4.361694


def CDFExpo(lam, point_n=1000, name='CDF Expo'):
xs= sorted([random.expovariate(lam) for i in xrange(point_n)])
ys= [1- pow(math.e, -lam * x) for x in xs]
return Cdf.Cdf(xs,ys,name)

lam = math.log(2)
mean, percentile = 0.5, 0.95
cdf = CDFExpo(lam)
print 'expo mean is %f, percentile 95%% is %f' % (cdf.Value(mean), cdf.Value(percentile))

Bo Dong  Mar 19, 2015 
Printed Page 99
Third formula (for \rho)

The denominator in the formula for \rho as written is identically equal to 0. This is because the deviations of both x and of y sum to zero, i.e. 0*0 = 0.

To remedy, replace each of the sums of deviations in the current version of the denominator with the formula for the corresponding standard deviation:

\rho = \frac{\sum dx_i dy_i}{ \sqrt{\sum dx_i^2} \sqrt{\sum dy_i^2}}.

Cheers,

Stephen Stohs

PS Thanks for creating the great, thin intro stats text!

Stephen Stohs  Apr 06, 2019 
Printed Page 104
Residual formula

While not mathematically incorrect, especially in the case of squared residuals, the sign on your residual formula is opposite of the way it is normally defined, including in your definition of deviations on p. 98. (Note that a regression residual is the deviation of a data point from the regression line, which is the conditional mean of y given a value of x.)

The universal convention is to define deviations from the mean as positive for data values above the mean and negative otherwise.

To remedy this, revise to:

\varepsilon_i = y_i - (\alpha + \beta x_i).

Stephen Stohs  Apr 06, 2019 
Printed Page 110
Method ChiSquared in Class PregLenghtTest

Wonderful Book! I would need just an explanation of following code:

def ChiSquared

expected = self.expected_probs * len(lenghts))

Whay are the expected_probs multiplied for the number of all entries in the corresponding interval? To get a probability I would do so:

pmf(35) * number_entries_in(35) + ... pmf(44) * number_entries_in(44)

Thank you for your help!
Fabio

Fabio  Mar 06, 2017