The beta distribution

There is one more optimization that solves this problem even faster.

So far we have used a Pmf object to represent a discrete set of values for x. Now we will use a continuous distribution, specifically the beta distribution (see http://en.wikipedia.org/wiki/Beta_distribution).

The beta distribution is defined on the interval from 0 to 1 (including both), so it is a natural choice for describing proportions and probabilities. But wait, it gets better.

It turns out that if you do a Bayesian update with a binomial likelihood function, as we did in the previous section, the beta distribution is a conjugate prior. That means that if the prior distribution for x is a beta distribution, the posterior is also a beta distribution. But wait, it gets even better.

The shape of the beta distribution depends on two parameters, written α and β, or alpha and beta. If the prior is a beta distribution with parameters alpha and beta, and we see data with h heads and t tails, the posterior is a beta distribution with parameters alpha+h and beta+t. In other words, we can do an update with two additions.

So that’s great, but it only works if we can find a beta distribution that is a good choice for a prior. Fortunately, for many realistic priors there is a beta distribution that is at least a good approximation, and for a uniform prior there is a perfect match. The beta distribution with alpha=1 and beta=1 is uniform from 0 to 1.

Let’s see how we can take advantage of all this. ...

Get Think Bayes now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.