This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
314 | Glossary
1,000 cells as an adult. C. elegans was the
first animal to have its complete genome
sequenced. See http://www.wormbase.org.
The abbreviation for a coding sequence.
CDS isn’t synonymous with exon, since
exons may contain noncoding sequence.
Three contiguous letters of DNA or RNA.
Each of the 64 codons specifies either an
amino acid or a translation stop.
The complement of a DNA sequence is
the sequence on the other strand. For
example, the complement of ACCCGT is
TGGGCA. To complement a sequence in
Perl, use either of the following:
# 4-letter alphabet
$dna =~ tr/ACGT/TGCA/;
# 15-letter alphabet
$dna =~ tr[ACGTRYWSKMBDHV]
The common fruit fly. This is one of the
most famous organisms for genetic
research and was one of the first animals
whose complete genomic sequence was
determined. See http://www.fruitfly.org.
A common technique that reduces the
computational complexity of a problem
by finding and extending a partial optimi-
Eschericia coli. A common bacteria nor-
mally found in your gut and a favorite
organism for molecular biology research.
Some variants cause food poisoning.
Karlin-Altschul statistics assume
sequences of infinite length. To adjust for
edge effects in real sequences, the search
space is reduced by adjusting the true
lengths of the sequences to effective
Randomness; disorder; unpredictability.
Organisms with intracellular membra-
nous organelles such as the nucleus and
mitochondria are called eukaryotes.
A mutation that causes an insertion or
deletion of nucleotides that isn’t a multi-
ple of three, and therefore causes the read-
ing frame to change.
A functional unit of the genome. When
not specifically stated, “gene” is usually
considered a “protein-coding” gene, but
many genes don’t contain the instructions
for proteins (e.g., various RNA genes).
The mapping of codons to amino acids.
See Table 2-3.
The tendency of sequences to change over
time by accumulating random mutations.
The complete genetic material for an
organism. For eukaryotes, the genome
refers to the nuclear genome and doesn’t
An alignment algorithm that requires
every letter of each sequence to appear in
the alignment. Globally aligning
sequences of different lengths may lead to
very strange alignments.
In sequence analysis, homologous means
derived from a common ancestor.
Sequences are either homologous or they
aren’t. It is incorrect to say that sequences
are 80 percent homologous unless you
mean that there is an 80 percent chance of
common ancestry. Use percent identity to
describe the similarity of alignments.
Literally, “likes water.” Water is a polar
molecule that mixes well with other polar
molecules. The charged amino acids K, R,
D, and E, are examples of hydrophilic