
Example: Finding Microbes to Clean Up Oil Spills ◾ 205
abstract collection of 163,762 documents containing 1689 microbes. We
will use this data set to rank the organisms according to how well they are
likely to be of interest for our application.
ORGANISM RANKING STRATEGY
We can think of each organism as being represented by the collec-
tion of papers written about it. We can summarize this collection as
a centroid vector in word/phrase space as we have described previ-
ously. These centroids can then be used to create a distance matrix that
describes the difference of every organism from every other organ-
ism. Think of this as a fully connected ...