This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
xdformat Parameters
|
281
wordmask=[method]
Default: Off
Filters the query sequence for seeding only. Low-complexity region in the query sequence
isn’t used in the initial word search but is available for alignment during the extension
stage; called soft masking.
See also
filter, lcfilter, lcmask, echofilter, maskextra
W=[integer]
Default: 11 blastn, 3 others
Sets the word size for seeding alignments.
See also
T, hitdist, wink
X=[integer]
Default: Variable; depends on scoring parameters
Controls the alignment extension cutoff for ungapped alignments.
See also
gapX
Y=[number]
Default: Variable; depends on scoring parameters
Sets the size of the query sequence.
See also
Z
Z=[number]
Default: Variable; depends on scoring parameters
Sets the size of the database in letters (restest is assumed), but Z may also be used to mean
the number of sequences if
seqtest is set.
See also
Y, seqtest, restest
xdformat Parameters
xdformat formats BLAST databases from FASTA files. It also reports descriptive
information about the database and dumps the entire content to FASTA format.
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
282
|
Chapter 14: WU-BLAST Reference
Here are some examples:
xdformat -n files
xdformat -p files
zcat fasta.*.gz | xdformat -o my_db -n -- -
xdformat -n -i database
xdformat -n -r datatbase > fasta_file
-A [0..2]
Default: 2
When indexing accession.version identifiers, you have three indexing options:
0 Accession only; version isn’t stored
1 Stored as accession.version
2 Stored as both accession only and accession.version
-a [database]
Appends sequences to the named database. If the database is indexed, the appended
sequences will also be indexed.
-c [character]
Default: Off
If an invalid letter is encountered, xdformat terminates and reports an error message. If this
occurs, check the sequence file for errors. After checking, you may either skip illegal char-
acters with
-k or change them to a legal character with -c. The typical operation for
nucleotides is to set
-c N, and for proteins -c X.
See also
-k
-D [integer]
Default: Unlimited
Sets the maximum length for definition lines.
-d [string]
Default: None
Sets a user-defined release date for the database. The date may have 63 characters at most.
See also
-v
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
xdformat Parameters
|
283
-e [file]
Default: stderr
Appends information and errors to the named file.
-G
Default: Off
Prefaces each sequence with the database record number in the format of gnl|xdf|#.
-i
Default: Off
Reports descriptive information about a BLAST database. This is useful for determining
when a database was created, how many sequences it contains, and if it is indexed.
-K [integer]
Default: Unlimited
Sets the maximum number of identifiers with Control-A separators. This is useful for trim-
ming highly redundant sequences created with nrdb or another redundancy purifier that
uses Control-A separators.
-k
Default: Off
If an invalid letter is encountered, xdformat terminates. If this occurs, you can either skip
illegal characters with
-k or change them to a legal letter with -c. Check the errors to
ensure the input file is formatted properly.
See also
-c
-L [number]
Default: 100000000 (100 million letters)
Sets the maximum sequence length. For optimal performance, break up large sequences
into smaller fragments no larger than 1 million letters.
-l [number]
Default: 0
Sets the minimum sequence length.
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
284
|
Chapter 14: WU-BLAST Reference
-M [number]
Default: 96m
Sets the cache size for indexing. For faster indexing, the size may be increased (for example,
-M 512m).
-O [4..8]
Default: 4
Sets the number of bytes of precision. The default value allows databases of up to 4 billion
amino acids or 16 billion nucleotides. If you expect a database to contain more than this
limit, increasing precision by one level multiplies the limit by 256. Setting
-O is necessary
only if you append to the database because the precision automatically increases appropri-
ately when databases are created.
-P [integer]
Default: 60
This option applies only when dumping the entire content of a database with -r. -P
controls the length of the sequence lines; -P 0 puts the whole sequence on one line.
See also
-r
-q [0..3]
Default: 0
Certain files may contain numerous nonfatal errors in their identifier format. -q quiets
these errors.
0 No silencing
1 Silences field1 errors
2 Silences field 2 errors
3 Silences all fields
-r
Default: Off
Reports (dumps) the entire database content to stdout in FASTA format.
-T [string]
Default: Off
This option lets you restrict indexing of identifiers to a particular database name or tag.
The [string] has two parts: part 1 is the name of the database (e.g.,
gb for GenBank or emb
for EMBL—see Chapter 10), and part 2 is either blank or a number.

Get BLAST now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.