A statistical statement appeared in “The Guardian” on Friday January 4, 2002:
When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110. ‘It looks very suspicious to me,’ said Barry Blight, a statistics lecturer at the London School of Economics. ‘If the coin were unbiased, the chance of getting a result as extreme as that would be less than 7%.’
But do these data give evidence that the coin is biased rather than fair?
To answer that question, we’ll proceed in two steps. The first is to estimate the probability that the coin lands face up. The second is to evaluate whether the data support the hypothesis that the coin is biased.
Any given coin has some probability, x, of landing heads up when spun on edge. It seems reasonable to believe that the value of x depends on some physical characteristics of the coin, primarily the distribution of weight.
If a coin is perfectly balanced, we expect x to be close to 50%, but for a lopsided coin, x might be substantially different. We can use Bayes’s theorem and the observed data to estimate x.
Let’s define 101 hypotheses, where Hx is the hypothesis that the probability of heads is x%, for values from 0 to 100. I’ll start with a uniform prior where the probability of Hx is the same for all x. We’ll come back later to consider other priors.
The likelihood function is relatively easy: If Hx is true, the probability of heads is and the probability of tails is .
class Euro(Suite): def Likelihood(self, data, hypo): x = hypo if data == 'H': return x/100.0 else: return 1 - x/100.0
Here’s the code that makes the suite and updates it:
suite = Euro(xrange(0, 101)) dataset = 'H' * 140 + 'T' * 110 for data in dataset: suite.Update(data)
The result is in Figure 4-1.
def MaximumLikelihood(pmf): """Returns the value with the highest probability.""" prob, val = max((prob, val) for val, prob in pmf.Items()) return val
In this case the result is 56, which is also the observed percentage of heads, . So that suggests (correctly) that the observed percentage is the maximum likelihood estimator for the population.
print 'Mean', suite.Mean() print 'Median', thinkbayes.Percentile(suite, 50)
The mean is 55.95; the median is 56. Finally, we can compute a credible interval:
print 'CI', thinkbayes.CredibleInterval(suite, 90)
The result is .
Now, getting back to the original question, we would like to know whether the coin is fair. We observe that the posterior credible interval does not include 50%, which suggests that the coin is not fair.
But that is not exactly the question we started with. MacKay asked, “ Do these data give evidence that the coin is biased rather than fair?” To answer that question, we will have to be more precise about what it means to say that data constitute evidence for a hypothesis. And that is the subject of the next chapter.
But before we go on, I want to address one possible source of
confusion. Since we want to know whether the coin is fair, it might be
tempting to ask for the probability that
x is 50%:
The result is 0.021, but that value is almost meaningless. The decision to evaluate 101 hypotheses was arbitrary; we could have divided the range into more or fewer pieces, and if we had, the probability for any given hypothesis would be greater or less.
We started with a uniform prior, but that might not be a good choice. I can believe that if a coin is lopsided, x might deviate substantially from 50%, but it seems unlikely that the Belgian Euro coin is so imbalanced that x is 10% or 90%.
It might be more reasonable to choose a prior that gives higher probability to values of x near 50% and lower probability to extreme values.
As an example, I constructed a triangular prior, shown in Figure 4-2. Here’s the code that constructs the prior:
def TrianglePrior(): suite = Euro() for x in range(0, 51): suite.Set(x, x) for x in range(51, 101): suite.Set(x, 100-x) suite.Normalize()
Figure 4-2 shows the result (and the uniform prior for comparison). Updating this prior with the same dataset yields the posterior distribution shown in Figure 4-3. Even with substantially different priors, the posterior distributions are very similar. The medians and the credible intervals are identical; the means differ by less than 0.5%.
The code I have shown so far is meant to be easy to read, but it is not very efficient. In general, I like to develop code that is demonstrably correct, then check whether it is fast enough for my purposes. If so, there is no need to optimize. For this example, if we care about run time, there are several ways we can speed it up.
The first opportunity is to reduce the number of times we normalize
the suite. In the original code, we call
Update once for each spin.
dataset = 'H' * heads + 'T' * tails for data in dataset: suite.Update(data)
And here’s what
Update looks like:
def Update(self, data): for hypo in self.Values(): like = self.Likelihood(data, hypo) self.Mult(hypo, like) return self.Normalize()
Each update iterates through the hypotheses, then calls
Normalize, which iterates through
the hypotheses again. We can save some time by doing all of the updates
Suite provides a
that does exactly that. Here it is:
def UpdateSet(self, dataset): for data in dataset: for hypo in self.Values(): like = self.Likelihood(data, hypo) self.Mult(hypo, like) return self.Normalize()
And here’s how we can invoke it:
dataset = 'H' * heads + 'T' * tails suite.UpdateSet(dataset)
This optimization speeds things up, but the run time is still
proportional to the amount of data. We can speed things up even more by
process the entire dataset, rather than one spin at a time.
In the original version,
data is a string that encodes either heads or
def Likelihood(self, data, hypo): x = hypo / 100.0 if data == 'H': return x else: return 1-x
def Likelihood(self, data, hypo): x = hypo / 100.0 heads, tails = data like = x**heads * (1-x)**tails return like
And then we can call
Update like this:
heads, tails = 140, 110 suite.Update((heads, tails))
Since we have replaced repeated multiplication with exponentiation, this version takes the same time for any number of spins.
So far we have used a Pmf object to represent a discrete set of
x. Now we will use a
continuous distribution, specifically the beta distribution (see http://en.wikipedia.org/wiki/Beta_distribution).
The beta distribution is defined on the interval from 0 to 1 (including both), so it is a natural choice for describing proportions and probabilities. But wait, it gets better.
It turns out that if you do a Bayesian update with a binomial
likelihood function, as we did in the previous section, the beta
distribution is a conjugate prior. That
means that if the prior distribution for
x is a beta distribution, the posterior is also
a beta distribution. But wait, it gets even better.
The shape of the beta distribution depends on two parameters,
written α and β, or
beta. If the prior is a beta
distribution with parameters
beta, and we see data with
h heads and
tails, the posterior is a beta distribution with parameters
beta+t. In other words, we can do an update with
So that’s great, but it only works if we can find a beta
distribution that is a good choice for a prior. Fortunately, for many
realistic priors there is a beta distribution that is at least a good
approximation, and for a uniform prior there is a perfect match. The beta
beta=1 is uniform from 0 to 1.
class Beta(object): def __init__(self, alpha=1, beta=1): self.alpha = alpha self.beta = beta
makes a uniform distribution.
performs a Bayesian update:
def Update(self, data): heads, tails = data self.alpha += heads self.beta += tails
data is a pair of integers
representing the number of heads and tails.
So we have yet another way to solve the Euro problem:
beta = thinkbayes.Beta() beta.Update((140, 110)) print beta.Mean()
Mean, which computes a simple function of
def Mean(self): return float(self.alpha) / (self.alpha + self.beta)
For the Euro problem the posterior mean is 56%, which is the same result we got using Pmfs.
def EvalPdf(self, x): return x**(self.alpha-1) * (1-x)**(self.beta-1)
MakePmf, which uses
EvalPdf to generate a discrete approximation of
the beta distribution.
In this chapter we solved the same problem with two different priors and found that with a large dataset, the priors get swamped. If two people start with different prior beliefs, they generally find, as they see more data, that their posterior distributions converge. At some point the difference between their distribution is small enough that it has no practical effect.
When this happens, it relieves some of the worry about objectivity that I discussed in the previous chapter. And for many real-world problems even stark prior beliefs can eventually be reconciled by data.
But that is not always the case. First, remember that all Bayesian analysis is based on modeling decisions. If you and I do not choose the same model, we might interpret data differently. So even with the same data, we would compute different likelihoods, and our posterior beliefs might not converge.
Also, notice that in a Bayesian update, we multiply each prior probability by a likelihood, so if is 0, is also 0, regardless of D. In the Euro problem, if you are convinced that x is less than 50%, and you assign probability 0 to all other hypotheses, no amount of data will convince you otherwise.
This observation is the basis of Cromwell’s rule, which is the recommendation that you should avoid giving a prior probability of 0 to any hypothesis that is even remotely possible (see http://en.wikipedia.org/wiki/Cromwell’s_rule).
Cromwell’s rule is named after Oliver Cromwell, who wrote, “I beseech you, in the bowels of Christ, think it possible that you may be mistaken.” For Bayesians, this turns out to be good advice (even if it’s a little overwrought).
Suppose that instead of observing coin tosses directly, you
measure the outcome using an instrument that is not always correct.
Specifically, suppose there is a probability
y that an actual heads is reported as tails,
or actual tails reported as heads.
Write a class that estimates the bias of a coin given a series
of outcomes and the value of
How does the spread of the posterior distribution depend on
This exercise is inspired by a question posted by a “redditor” named dominosci on Reddit’s statistics “subreddit” at http://reddit.com/r/statistics.
Reddit is an online forum with many interest groups called subreddits. Users, called redditors, post links to online content and other web pages. Other redditors vote on the links, giving an “upvote” to high-quality links and a “downvote” to links that are bad or irrelevant.
A problem, identified by dominosci, is that some redditors are more reliable than others, and Reddit does not take this into account.
The challenge is to devise a system so that when a redditor casts a vote, the estimated quality of the link is updated in accordance with the reliability of the redditor, and the estimated reliability of the redditor is updated in accordance with the quality of the link.
One approach is to model the quality of the link as the probability of garnering an upvote, and to model the reliability of the redditor as the probability of correctly giving an upvote to a high-quality item.
Write class definitions for redditors and links and an update function that updates both objects whenever a redditor casts a vote.