## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

## Predictive distributions

I introduced the hidden species problem in the form of four related questions. We have answered the first two by computing the posterior distribution for `n` and the prevalence of each species.

The other two questions are:

• If we are planning to collect additional reads, can we predict how many new species we are likely to discover?

• How many additional reads are needed to increase the fraction of observed species to a given threshold?

To answer predictive questions like this we can use the posterior distributions to simulate possible future events and compute predictive distributions for the number of species, and fraction of the total, we are likely to see.

The kernel of these simulations looks like this:

1. Choose `n` from its posterior distribution.

2. Choose a prevalence for each species, including possible unseen species, using the Dirichlet distribution.

3. Generate a random sequence of future observations.

4. Compute the number of new species, `num_new`, as a function of the number of additional reads, `k`.

5. Repeat the previous steps and accumulate the joint distribution of `num_new` and `k`.

And here’s the code. `RunSimulation` runs a single simulation:

```# class Subject

m, seen = self.GetSeenSpecies()

curve = []
for k, obs in enumerate(observations):
`num_reads` is the number of additional reads to simulate. ...