O'Reilly logo

BLAST by Joseph Bedell, Mark Yandell, Ian Korf

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
224
|
Chapter 12: Hardware and Software Optimizations
Optimized NCBI-BLAST
The source code for NCBI-BLAST is in the public domain, and anyone can modify it
without restriction (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools). It’s therefore not sur-
prising that there are a number of variants. The rest of this chapter discusses three of
them.
Apple/Genentech BLAST
Macintosh G4 computers have an additional vector processing unit called Velocity-
Engine or Altivec that can process several similar instructions in parallel. Apple
Computer and Genentech collaborated to rewrite portions of NCBI-BLAST to take
advantage of the Altivec processor. These modifications affect the seeding phase of
BLASTN. The result, AG-BLAST, significantly outperforms NCBI-BLAST under cer-
tain conditions.
Table 12-5 shows an experiment in which a Caenorhabditis elegans transcript
(F44B9.10) was searched against the Caenorhabditis briggsae genome using various
word sizes but otherwise default parameters (the hardware is a 550-MHz Power-
Book). For cross-species work, it’s generally a good idea to employ word sizes
slightly smaller than the default 11 to minimize the chance of missing meaningful
similarities. Here, AG-BLAST has a significant speed advantage over NCBI-BLAST.
AG-BLAST also runs faster at very large word sizes, which is useful if you are match-
ing sequences that are expected to be identical or nearly identical (e.g., mapping
ESTs to their own genome).
Table 12-4. Serial BLAST performance
# First search Second search Speed Elapsed time (sec) HSPs
1
W=3 T=12
None 1 x 883.3 251
2
W=3 T=14
hitdist=40
None 7 x 121.4 186
3
W=3 T=999
hitdist=40
W=3 T=12
14 x 62.1 230
4
W=4 T=999 W=3 T=12
18 x 49.1 248
5
W=5 T=999 W=3 T=12
50 x 17.6 219
6
W=4 T=999
hitdist=40
W=3 T=12
80 x 11.1 137
7
W=5 T=999
hitdist=40
W=3 T=12
110 x 7.9 116
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Optimized NCBI-BLAST
|
225
AG-BLAST does have a few disadvantages. First, the version may be slightly out of
date with respect to NCBI-BLAST. The current version of AG-BLAST is based on
2.2.2, while NCBI-BLAST is up to Version 2.2.6. Not all changes are backward-
compatible; for example, the latest preformatted databases require Version 2.2.5.
Second, AG-BLAST doesn’t work with multiple CPUs. You can execute more than
one job at a time, but you can’t use the -a option to increase the number of CPUs
used by a single process. Finally, the minimum word size on AG-BLAST is 8, or
one greater than the NCBI-BLAST minimum. See http://developer.apple.com/
hardware/ve/acgresearch.html for more information.
Paracel-BLAST and BlastMachine
Paracel makes an NCBI-BLAST derivative called Paracel-BLAST and sells it with a
prepackaged computer cluster called a BlastMachine. This product takes all the high
performance hardware and software tricks and puts them into a single, easy-to-use
product. The hardware is a rack of Linux-Intel machines, and the DRM software is
Platform LSF. Large query sequences are chopped, small ones are packed, and data is
distributed so the search comes back as fast as possible. This is really convenient
because it lets users concentrate on what they want to do and not how they have to
do it. In the end, more science and less frustration is a good thing.
See http://www.paracel.com for more information.
TimeLogic Tera-BLAST
TimeLogic uses an entirely different approach to optimizing BLAST. The BLAST
algorithm is soft-wired into a special kind of chip called a field programmable gate
array (FPGA). Each FPGA executes the search very quickly and multiple FPGA
boards reside in a single computer called a DeCypher accelerator. The end result is a
specialized computer that is limited in what it can do, but what it does, it does aston-
ishingly well. A single DeCypher accelerator running Tera-BLAST (the name for their
Table 12-5. Apple/Genentech BLAST
W NCBI-BLAST (sec) AG-BLAST (sec) Speed increase
8 56.9 37.9 1.5 x
9 50.0 9.5 5.3 x
10 46.6 5.5 8.5 x
11 2.9 2.8 1.0 x
15 2.1 2.1 1.0 x
20 1.4 1.0 1.4 x
30 1.4 0.6 2.3 x
40 1.4 0.5 2.8 x

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required