This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Hardware and Software
This chapter explores how to optimize BLAST searches for maximum throughput
and will help you get the most out of your current and future hardware and soft-
ware. The first rule of BLAST performance is optimize your BLAST parameters.
Incorrect settings can cause BLAST to run slowly, and you can often achieve surpris-
ing increases in speed by adjusting a parameter or two. Chapter 9 can help you
choose the correct parameters for a particular experiment. If you’re already running
BLAST efficiently and want to get the most BLAST performance possible, read on.
The Persistence of Memory
Modern operating systems cache files. You may hear it referred to as RAM cache or
disk cache, but we’ll just call it cache. Once a file is read from the filesystem (e.g.,
hard disk), the file is kept in memory even after it is no longer used, assuming there’s
enough free RAM to do so. Why cache files? It’s frequently the case that the same file
is requested repeatedly. Retrieving from memory is much faster than from a disk, so
keeping it in memory can save a lot of time. Caching can be very important in
sequential BLAST searches if the database is located on a slow disk or across a net-
work. While the first search may be limited by the speed that the database can be
read, subsequent searches can be much faster.
The advantage of caching is most appreciable for insensitive BLAST searches, such as
BLASTN with a large word size. In more sensitive searches, retrieving sequences
from the database becomes a smaller fraction of the total elapsed time. In Table 12-1,
note how the speed increase from caching is a function of sensitivity (here, word
Table 12-1. How caching benefits insensitive searches
Program Word size Search 1 Search 2 Speed increase
BLASTN W=12 12 sec 7 sec 1.71 x
BLASTN W=10 33 sec 28 sec 1.18 x