O'Reilly logo

BLAST by Joseph Bedell, Mark Yandell, Ian Korf

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Further Reading
|
53
Aligning Transcripts to Genomic Sequence
Determining the correct exon-intron structure of genes isn’t always easy. One very
successful approach is to align transcripts back to their origin in a genome. How-
ever, this isn’t as simple as it may appear. Several solutions to this problem use either
global or local techniques.
A global alignment between a transcript and a genomic sequence is expected to have
huge gaps corresponding to the introns. In eukaryotic genomes, such as the human
genome, exons may account for only 1 to 2 percent of a genome. As a result, the gap
scores may completely dominate the scoring function, and the alignment may be of
little consequence. If the gap costs are too low, the alignment may spread out, and
exons may not be faithfully aligned. These problems are largely solved by gapping
with double affine penalties, but there are still potential problems with short exons
and introns.
A standard local alignment between a transcript and a genome typically identifies the
longest exon as the maximum scoring pair. This isn’t as useful, but many local align-
ment algorithms, like BLAST, produce more than one alignment. With these vari-
ants, mapping a transcript back to a genome is simply a matter of chaining the
individual alignments together. This turns out to be another tricky problem, but it
works well most of the time
Final Thoughts
When you use the Needleman-Wunsch or Smith-Waterman algorithms to find the
maximum scoring alignment, you’re playing by computational, not biological, rules.
As a result, the maximum scoring alignment only approximates the truth. However,
even if all the nuances of biology were clear and you could code this in a computer
algorithm, you might still favor the approximation because the computational cost of
the correct algorithm can be excessive. In any case, the fact that you can align the
unrelated words pelican and coelacanth merits consideration. It’s possible to align any
sequence; finding proper meaning in alignments is up to the user, not the algorithm.
Further Reading
For more information on the Perl programming language, consider these books:
Christiansen, Tom and Nathan Torkington, Perl Cookbook (O’Reilly & Associates).
Schwartz, Randal L. and Tom Phoenix, Learning Perl (O’Reilly).
Tisdall, James, Beginning Perl for Bioinformatics (O’Reilly).
Wall, Larry, Tom Christiansen, and Jon Orwant, Programming Perl (O’Reilly).
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
54
|
Chapter 3: Sequence Alignment
The following texts are indispensable resources for information on sequence align-
ment and algorithms in general:
Cormen, Thomas H. et al., Introduction to Algorithms (MIT Press).
Gusfield, Dan, Algorithms on Strings, Trees, and Sequences: Computer Science and
Computational Biology (Cambridge University Press).

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required