## The beta distribution

There is one more optimization that solves this problem even faster.

So far we have used a Pmf object to represent a discrete set of
values for `x`

. Now we will use a
continuous distribution, specifically the beta distribution (see http://en.wikipedia.org/wiki/Beta_distribution).

The beta distribution is defined on the interval from 0 to 1 (including both), so it is a natural choice for describing proportions and probabilities. But wait, it gets better.

It turns out that if you do a Bayesian update with a binomial
likelihood function, as we did in the previous section, the beta
distribution is a **conjugate prior**. That
means that if the prior distribution for `x`

is a beta distribution, the posterior is also
a beta distribution. But wait, it gets even better.

The shape of the beta distribution depends on two parameters,
written α and β, or `alpha`

and
`beta`

. If the prior is a beta
distribution with parameters `alpha`

and
`beta`

, and we see data with `h`

heads and `t`

tails, the posterior is a beta distribution with parameters `alpha+h`

and `beta+t`

. In other words, we can do an update with
two additions.

So that’s great, but it only works if we can find a beta
distribution that is a good choice for a prior. Fortunately, for many
realistic priors there is a beta distribution that is at least a good
approximation, and for a uniform prior there is a perfect match. The beta
distribution with `alpha=1`

and `beta=1`

is uniform from 0 to 1.

Let’s see how we can take advantage of all this. ...

Get *Think Bayes* now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.