Lions and tigers and bears

I’ll start with a simplified version of the problem where we know that there are exactly three species. Let’s call them lions, tigers and bears. Suppose we visit a wild animal preserve and see 3 lions, 2 tigers and one bear.

If we have an equal chance of observing any animal in the preserve, the number of each species we see is governed by the multinomial distribution. If the prevalence of lions and tigers and bears is p_lion and p_tiger and p_bear, the likelihood of seeing 3 lions, 2 tigers and one bear is

p_lion**3 * p_tiger**2 * p_bear**1

An approach that is tempting, but not correct, is to use beta distributions, as in The beta distribution, to describe the prevalence of each species separately. For example, we saw 3 lions and 3 non-lions; if we think of that as 3 “heads” and 3 “tails,” then the posterior distribution of p_lion is:

    beta = thinkbayes.Beta()
    beta.Update((3, 3))
    print beta.MaximumLikelihood()

The maximum likelihood estimate for p_lion is the observed rate, 50%. Similarly the MLEs for p_tiger and p_bear are 33% and 17%.

But there are two problems:

  1. We have implicitly used a prior for each species that is uniform from 0 to 1, but since we know that there are three species, that prior is not correct. The right prior should have a mean of 1/3, and there should be zero likelihood that any species has a prevalence of 100%.

  2. The distributions for each species are not independent, because the prevalences have to add up to 1. To capture this ...

Get Think Bayes now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.