
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Sequence Database Management Strategies
|
211
human ESTs from the EST division, as well as all the mRNAs from the PRI (primate)
division. But if you’ve designated all ESTs and mRNAs as the cDNA moltype, get-
ting all human transcripts is as easy as retrieving all records in which the species is
Homo sapiens and the moltype is cDNA. You can add several more fields to the data-
base, like date created, division, keywords, etc., and get quite a bit of functionality
without much more complexity.
Overall, flat file indexing is a very good strategy for sequence management because it
is simple, fast, and retains the data in its original format. You don’t even have to
write any software, as both free and commercial software packages are designed spe-
cifically for managing flat file data. Check out the Bioperl project at http://bioperl.org,
MyGenBank at http://sourceforge.net/projects/mgb, and SRS (see Table 11-4).
Commercial Sequence Management Software
Several commercial software packages are designed for managing biological sequence
data. The database software is generally part of a much larger software suite that
includes sequence analysis tools such as BLAST and visualization tools to make
interpretation easier. The companies that develop these packages expend a great deal
of effort to make ...