This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Chapter 7: A BLAST Statistics Tutorial
There are, of course, many reasons why you might not be able to identify an oligo in
the Drosophila melanogaster genome. First, the oligo might contain repetitive
sequence and thus be masked out. However, because WU-BLAST doesn’t mask by
default, that can’t be the reason. Second, the assembled genome may be incomplete.
Every sequenced genome to date is incomplete to some degree. In fact, a 99 percent
complete 124mb genome is still missing 1.24 mega-bases of a euchromatic (nonre-
petitive DNA) sequence, leaving plenty of space for an oligo to go missing in. The
incompleteness of the genome is a possible explanation for our WU-BLAST result,
but is it the correct one? Before concluding that the oligo falls into a sequencing gap,
let’s try to run NCBI-BLASTN with its default parameters. Aha! The NCBI-BLASTN
results in Example 7-5 show that the oligo is present in the Drosophila melanogaster
genome and the HSP is assigned a significant Expect.
Example 7-4. The oligo isn’t found
Reference: Gish, W. (1996-2000) http://blast.wustl.edu
Notice: this program and its default parameter settings are optimized to find
nearly identical sequences rapidly. To identify weak similarities encoded in
nucleic acid, use BLASTX, TBLASTN or TBLASTX.
7 sequences; 124,181,667 total letters.
Sequences producing High-scoring Segment Pairs: Score P(N) N
*** NONE ***
Example 7-5. Using NCBI-BLASTN to find the oligo
Sequences producing significant alignments: (bits) Value
2R 2R.3 assembled 23-11-2001 50 1e-06
X X release:2 length:21666217bp Assembled X chromosome arm seque... 32 0.25
3R 3R.3 32 0.25
U GenomicInterval:U 30 0.99
3L 3L.3 v.3e 23351213bp BCM HGSC guide:3l-mtp-eval.08apr02 28 3.9
2L 2L release:3 length:22217931bp Assembled 2L chromosome arm se... 28 3.9
>2R 2R.3 assembled 23-11-2001
Length = 20302755
Score = 50.1 bits (25), Expect = 1e-06
Identities = 25/25 (100%)