
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Further Reading
|
53
Aligning Transcripts to Genomic Sequence
Determining the correct exon-intron structure of genes isn’t always easy. One very
successful approach is to align transcripts back to their origin in a genome. How-
ever, this isn’t as simple as it may appear. Several solutions to this problem use either
global or local techniques.
A global alignment between a transcript and a genomic sequence is expected to have
huge gaps corresponding to the introns. In eukaryotic genomes, such as the human
genome, exons may account for only 1 to 2 percent of a genome. As a result, the gap
scores may completely dominate the scoring function, and the alignment may be of
little consequence. If the gap costs are too low, the alignment may spread out, and
exons may not be faithfully aligned. These problems are largely solved by gapping
with double affine penalties, but there are still potential problems with short exons
and introns.
A standard local alignment between a transcript and a genome typically identifies the
longest exon as the maximum scoring pair. This isn’t as useful, but many local align-
ment algorithms, like BLAST, produce more than one alignment. With these vari-
ants, mapping a transcript back to a genome is simply a matter of chaining the
individual alignments together. This ...