This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
CPUs and Computer Architecture
|
215
The problem with the first pipeline is that if the BLAST databases are large, they may
not all be cached. Each BLAST database can bump out the previously cached file if
you don’t have enough RAM, and then you get no benefit from caching. The second
structure keeps the same BLAST database in memory for all the sequences. Before
you tear apart your current pipeline, however, remember that caching isn’t going to
help much with sensitive searches. If most of your searches are sensitive, it is a waste
of effort to optimize the already fast parts of your pipeline. As in any tuning proce-
dure, optimize the major bottlenecks first.
CPUs and Computer Architecture
The clock speed of a CPU isn’t necessarily an accurate indicator of how fast it will
run BLAST. There are other complicating factors such as the amount of L2 cache,
the memory latency and the speed of the front-side bus. Unfortunately, there is no
good rule to predict how fast BLAST will perform on a particular computer except
for the obvious within-family predictions—for example, that a 1-GHz Pentium III
will be faster than an 800-MHz Pentium III. The best you can do is to benchmark a
bunch of systems or contact people who have already done so.
Two benchmarks are provided Table 12-3. Before reading the description, please
understand that you should use extreme caution whenever interpreting any bench-
marks because the benchmarking protocol may be very different from your everyday
tasks, and therefore may not reflect real-world performance. The best benchmark
procedure should mimic your daily routine. In addition, if you use benchmarks to
decide what hardware to purchase, you may be in for a nasty surprise, as other
important considerations may override a simplistic interpretation of the “most
BLAST for the buck.” Total cost of ownership is a complicated equation that
includes maintenance, support, facilities, cooling, and interfacing with legacy equip-
ment and culture.
Table 12-2 shows the performance on various platforms when searching all mem-
bers of a database against themselves. There are two databases, and both can be
found at http://examples.oreilly.com/BLAST. The tests were performed using default
parameters for NCBI-BLAST. The following command lines were used:
time blastall -p blastn -d ESTs -i ESTs > /dev/null
time blastall -p blastp -d globins -i globins > /dev/null
Table 12-2. Performance benchmarks of various systems
CPU; clock speed blastn test blastp test
Time (sec) Giga-cycles Time (sec) Giga-cycles
Macintosh G4: 550 MHz 1011 556 1599 879
Sun Ultra Sparc III; 750 MHz 835 626 1427 1070
Intel Pentium III; 1 GHz 649 649 1187 1187

Get BLAST now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.