
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
75
Chapter 5
CHAPTER 5
BLAST
Previous chapters explored what biological sequences are, how they are aligned, and
how similarity is measured. This chapter discusses BLAST itself. What is BLAST?
The simple answer is that it is a set of programs that search sequence databases for
statistically significant similarities. The details of how BLAST searches for similari-
ties aren’t so easily answered. Searching requires multiple steps and many control-
ling parameters. Understanding the theoretical framework will help you design and
interpret BLAST experiments, and give you a foundation for troubleshooting when
your search produces unexpected results.
The Five BLAST Programs
The five traditional BLAST programs are: BLASTN, BLASTP, BLASTX, TBLASTN,
and TBLASTX. BLASTN compares nucleotide sequences to one another (hence the
N). All other programs compare protein sequences (see Table 5-1).
Table 5-1. Traditional BLAST programs
Program Database Query Typical uses
BLASTN Nucleotide Nucleotide Mapping oligonucleotides, cDNAs, and PCR products to a
genome; screening repetitive elements; cross-species
sequence exploration; annotating genomic DNA; clustering
sequencing reads; vector clipping
BLASTP Protein Protein Identifying common regions between proteins; collecting
related proteins for phylogenetic analyses ...