This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
ac.jp), the European Molecular Biology Laboratory, (EMBL, http://www.embl.org),
and GenBank from the National Center for Biotechnology Information (NCBI, http://
ncbi.nlm.nih.gov/GenBank). This consortium collaborates to form the largest public
repository for DNA and protein sequences in the world. Because it is such an impor-
tant resource, this chapter spends some time exploring it.
The amount of publicly available sequence has been growing geometrically, dou-
bling approximately every 14 months (see Figure 11-2). Fortunately, computer tech-
nology has also kept pace. While it seems scary that GenBank is currently
approaching 100 GB and will be half a terabyte in a few years, it’s nice to know that
this isn’t going to be a problem. Not every database grows so fast, though. Organ-
ism-specific databases such as the Saccharomyces Genome Database, WormBase,
and FlyBase are growing at a more moderate pace, principally because the sequence
of their genomes is complete. But many new genome projects are just getting started,
and they will probably grow very quickly.
Sequence databases usually offer their data in several different formats. The FASTA
format is universally accepted for operating on sequences, but many sequence data-
bases record a lot more data than just the sequence. Such extra information is com-
monly presented in a human-readable format called a flat file. The INSD uses two
kinds of flat files. The DDBJ and GenBank flat file formats are identical, while the
EMBL format is slightly different. The following DDBJ/GenBank record corre-
sponds to a fragment of the Hoxa-11 gene from the coelacanth (the ancient fish on
the cover of the book):
LOCUS AF287139 606 bp DNA linear VRT 10-DEC-2000
DEFINITION Latimeria chalumnae Hoxa-11 gene, partial cds.
VERSION AF287139.1 GI:11611818
SOURCE Latimeria chalumnae.
ORGANISM Latimeria chalumnae
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Coelacanthiformes; Coelacanthidae; Latimeria.
REFERENCE 1 (bases 1 to 606)
AUTHORS Chiu,C.H., Nonaka,D., Xue,L., Amemiya,C.T. and Wagner,G.P.
TITLE Evolution of Hoxa-11 in lineages phylogenetically positioned along
the fin-limb transition
JOURNAL Mol. Phylogenet. Evol. 17 (2), 305-316 (2000)
REFERENCE 2 (bases 1 to 606)