I’ll start with a simplified version of the problem where we know that there are exactly three species. Let’s call them lions, tigers and bears. Suppose we visit a wild animal preserve and see 3 lions, 2 tigers and one bear.
If we have an equal chance of observing any animal in the preserve,
the number of each species we see is governed by the multinomial
distribution. If the prevalence of lions and tigers and bears is p_lion
and p_tiger
and p_bear
, the likelihood of seeing 3 lions, 2 tigers
and one bear is
p_lion**3 * p_tiger**2 * p_bear**1
An approach that is tempting, but not correct, is to use beta
distributions, as in The beta distribution, to describe the prevalence
of each species separately. For example, we saw 3 lions and 3 non-lions;
if we think of that as 3 “heads” and 3 “tails,” then the posterior
distribution of p_lion
is:
beta = thinkbayes.Beta() beta.Update((3, 3)) print beta.MaximumLikelihood()
The maximum likelihood estimate for p_lion
is the observed rate, 50%. Similarly the
MLEs for p_tiger
and
p_bear
are 33% and
17%.
But there are two problems:
We have implicitly used a prior for each species that is uniform from 0 to 1, but since we know that there are three species, that prior is not correct. The right prior should have a mean of 1/3, and there should be zero likelihood that any species has a prevalence of 100%.
The distributions for each species are not independent, because the prevalences have to add up to 1. To capture this ...
No credit card required