This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Aligning Transcripts to Genomic Sequence
Determining the correct exon-intron structure of genes isn’t always easy. One very
successful approach is to align transcripts back to their origin in a genome. How-
ever, this isn’t as simple as it may appear. Several solutions to this problem use either
global or local techniques.
A global alignment between a transcript and a genomic sequence is expected to have
huge gaps corresponding to the introns. In eukaryotic genomes, such as the human
genome, exons may account for only 1 to 2 percent of a genome. As a result, the gap
scores may completely dominate the scoring function, and the alignment may be of
little consequence. If the gap costs are too low, the alignment may spread out, and
exons may not be faithfully aligned. These problems are largely solved by gapping
with double affine penalties, but there are still potential problems with short exons
A standard local alignment between a transcript and a genome typically identifies the
longest exon as the maximum scoring pair. This isn’t as useful, but many local align-
ment algorithms, like BLAST, produce more than one alignment. With these vari-
ants, mapping a transcript back to a genome is simply a matter of chaining the
individual alignments together. This turns out to be another tricky problem, but it
works well most of the time
When you use the Needleman-Wunsch or Smith-Waterman algorithms to find the
maximum scoring alignment, you’re playing by computational, not biological, rules.
As a result, the maximum scoring alignment only approximates the truth. However,
even if all the nuances of biology were clear and you could code this in a computer
algorithm, you might still favor the approximation because the computational cost of
the correct algorithm can be excessive. In any case, the fact that you can align the
unrelated words pelican and coelacanth merits consideration. It’s possible to align any
sequence; finding proper meaning in alignments is up to the user, not the algorithm.
For more information on the Perl programming language, consider these books:
Christiansen, Tom and Nathan Torkington, Perl Cookbook (O’Reilly & Associates).
Schwartz, Randal L. and Tom Phoenix, Learning Perl (O’Reilly).
Tisdall, James, Beginning Perl for Bioinformatics (O’Reilly).
Wall, Larry, Tom Christiansen, and Jon Orwant, Programming Perl (O’Reilly).