O'Reilly logo

BLAST by Joseph Bedell, Mark Yandell, Ian Korf

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
240
|
Chapter 13: NCBI-BLAST Reference
-Z [integer]
Default: 25
Programs: All
Sets the X3 dropoff value (in bits) for extensions but is bounded by the value for X2. It’s
generally not necessary to adjust this parameter.
formatdb Parameters
formatdb turns FASTA files into BLAST databases (ASN.1 format is also acceptable,
but because it isn’t commonly used, it isn’t covered in this book. You can find more
information about ASN.1 at http://www.ncbi.nlm.nih.gov/Sitemap/Summary/asn1.
html/). Chapter 11 discusses the typical methods for building BLAST databases and
examines the NCBI identifier syntax required for some aspects of formatdb and
blastall. Here are a few sample command lines:
formatdb -i protein_db
formatdb -p F -i nucleotide_db
zcat est*.gz | formatdb -p F -i stdin -o -n est -v 2000000000
The following reference lists the default value for each formatdb parameter.
-B [file]
Default: Optional
Specifies a binary GI output file. The advantage of using a binary GI file is that it’s smaller
than a corresponding text file and can be read directly into memory without being parsed.
See the
-F option.
To convert a text GI file to binary, use the following command:
formatdb -F text_gi_list -B binary_gi_list
-F [file]
Default: Optional
Specifies a GI file, either text or binary. This is used for creating an alias database that
doesn’t contain sequences, but pointers to sequences stored in another database (which
may be an alias database as well). See the
-L parameter. The databases must use the NCBI
FASTA identifier syntax, include GI numbers, and be indexed with
-o.
-i [file]
Default: Required
Sets the input FASTA file. You may specify that input come from stdin with -i stdin, but
you must also set the
-n parameter to give it a name. If you wish to make a single BLAST
database from multiple FASTA files, pipe them to formatdb as follows:
cat file1 file2 file3 | formatdb -i stdin -n my_db
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
formatdb Parameters
|
241
-l [file]
Default: formatdb.log
Specifies an output log file. Log messages are appended to this file.
-L [file]
Default: Optional
Creates an alias database, which has several uses. It can be a simple synonym for another
database, a selection of specific records from a database (see the
-F option), or a static
virtual database. Alias databases have the .pal or .nal extension, depending on whether they
are proteins or nucleotides.
To create an alias database with a selected set of GI numbers:
formatdb -i db -F gi_list -L alias_name -p [T/F]
To merge databases, first create a synonymous alias and then edit it to include additional
database names. Chapter 11 covers this process in more detail.
-n [string]
Default: Optional, required with -i stdin
Sets the base name for the BLAST database. If not specified, the name of the FASTA file
will be used. If the input is from stdin, this parameter must be set.
-o [T/F]
Default: Optional
Creates indexes. Indexing the databases isn’t required but is recommended. Alias data-
bases that use GI lists (see
-F and -L options) and the blastall -l option require indexed
databases. Additionally, some blastall output options specified with the
-m parameter
require indexing. Indexing adds four files with extensions .nnd, .nni, .nsd, and .nsi for
nucleotides and .pnd, .pni, .psd, and .psi for proteins. If you know you don’t need indexes,
you can save space by omitting
-o.
If GI numbers are included and more than one sequence has the same GI number,
formatdb terminates with an error. If accession numbers aren’t unique, an error won’t be
issued (see
-V).
-p [T/F]
Default: T
Specifies the type of type of file being formatted. By default, formatdb assumes the file is
protein, so you must set
-p F whenever you format nucleotide databases.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required