Chapter 15. Dealing with Dimensions
Belly button bacteria
Belly Button Biodiversity 2.0 (BBB2) is a nation-wide citizen science project with the goal of identifying bacterial species that can be found in human navels (http://bbdata.yourwildlife.org). The project might seem whimsical, but it is part of an increasing interest in the human microbiome, the set of microorganisms that live on human skin and parts of the body.
In their pilot study, BBB2 researchers collected swabs from the navels of 60 volunteers, used multiplex pyrosequencing to extract and sequence fragments of 16S rDNA, then identified the species or genus the fragments came from. Each identified fragment is called a “read.”
We can use these data to answer several related questions:
Based on the number of species observed, can we estimate the total number of species in the environment?
Can we estimate the prevalence of each species; that is, the fraction of the total population belonging to each species?
If we are planning to collect additional samples, can we predict how many new species we are likely to discover?
How many additional reads are needed to increase the fraction of observed species to a given threshold?
These questions make up what is called the Unseen Species problem.
Lions and tigers and bears
I’ll start with a simplified version of the problem where we know that there are exactly three species. Let’s call them lions, tigers and bears. Suppose we visit a wild animal preserve and see 3 lions, 2 tigers and one bear. ...