Let’s play a game. I’ll think of a distribution, and you have to guess what it is. We’ll start out easy and work our way up.

*I’m thinking of a distribution.* I’ll give you
two hints; it’s a normal distribution, and here’s a random sample drawn
from it:

{−0.441, 1.774, −0.101, −1.138, 2.975, −2.138}

What do you think is the mean parameter, *μ*,
of this distribution?

One choice is to use the sample mean to estimate
*μ*. Up until now, we have used the symbol
*μ* for both the sample mean and the mean
parameter, but now to distinguish them I will use x̄ for the sample
mean. In this example, x̄ is 0.155, so it would be reasonable to
guess *μ* = 0.155.

This process is called estimation, and the statistic we used (the sample mean) is called an estimator.

Using the sample mean to estimate *μ* is
so obvious that it is hard to imagine a reasonable alternative. But
suppose we change the game by introducing outliers.

*I’m thinking of a distribution.* It’s a normal
distribution, and here’s a sample that was collected by an unreliable
surveyor who occasionally puts the decimal point in the wrong
place.

{−0.441, 1.774, −0.101, −1.138, 2.975, −213.8}

Now what’s your estimate of *μ*? If you use the
sample mean, your guess is −35.12. Is that the best choice? What are the
alternatives?

One option is to identify and discard outliers, then compute the sample mean of the rest. Another option is to use the median as an estimator.

Which estimator is the best depends on the circumstances (for ...

Start Free Trial

No credit card required