
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
114
|
Chapter 7: A BLAST Statistics Tutorial
melanogaster genome. On the other hand, it appears that looking for short—less
than 15 base-pair—cis-regulatory elements using either version of BLASTN with the
default parameters is unlikely to be successful.
So what was the unreported WU-BLASTN Expect? Let’s calculate it. With the data
in Table 7-3 and the previously calculated effective HSP length of 294, first calculate
m´ and n´ using the Perl functions
effectiveLengthSeq and effectiveLengthDB. Plug-
ging m´ and n´ together with the WU-BLASTN λ and k and a raw score of 125 into
the
rawScoreToExpect function gives an Expect of 281. Recall that the NCBI-BLASTN
Expect was 1e
-6
. That’s a 281-million-fold difference. BLAST is clearly parameter-
sensitive! Using the default parameters, you instructed NCBI-BLASTN to search for
short highly conserved regions, and it found one. WU-BLASTN, on the other hand,
is parameterized to look for large regions of relatively low percent identity. This
would be fine for cross-species searches of poorly conserved exons but is inappropri-
ate for finding oligos.
Using BLAST intelligently requires using the correct parameters for the task at hand
and not placing too much faith in the reported Expect. See the section on BLAST
protocols in Chapter 9 for practical