Chapter 8. Estimation

The code for this chapter is in estimation.py. For information about downloading and working with this code, see Using the Code.

The Estimation Game

Let’s play a game. I think of a distribution, and you have to guess what it is. I’ll give you two hints: it’s a normal distribution, and here’s a random sample drawn from it:

[-0.441, 1.774, -0.101, -1.138, 2.975, -2.138]

What do you think is the mean parameter, μ, of this distribution?

One choice is to use the sample mean, , as an estimate of μ. In this example, is 0.155, so it would be reasonable to guess μ = 0.155. This process is called estimation, and the statistic we used (the sample mean) is called an estimator.

Using the sample mean to estimate μ is so obvious that it is hard to imagine a reasonable alternative. But suppose we change the game by introducing outliers.

I’m thinking of a distribution. It’s a normal distribution, and here’s a sample that was collected by an unreliable surveyor who occasionally puts the decimal point in the wrong place.

[-0.441, 1.774, -0.101, -1.138, 2.975, -213.8]

Now what’s your estimate of μ? If you use the sample mean, your guess is -35.12. Is that the best choice? What are the alternatives?

One option is to identify and discard outliers, and then compute the sample mean of the rest. ...

Get Think Stats, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.