
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Command-Line Tutorial
|
185
You should see the following output:
>AU109017 AU109017 Caenorhabditis elegans cDNA clone:yk701g7 : 5' end, single read.
TGGCCTACTGGGGTTTAATTACCCAAGTTTGAGATGGCTGCTGCTTCAGTGAAAGGCTTT
TTCCAGCGGACCGGAATCAGCATCAAAGAATATTTTAAACGAATGGGAAATGATTATGCT
ACTGTAGCTAGGGAAACTGTCCAAGGATGTAAAGATAGACCTGTTAAAGCTGGAGTTGTT
TTCTCTGGGCTCGGTTTTTTAACCTATGCATATCAGACAAATCCAACAGAGCTGGAAATG
TATGATTATTTATGCGAGAGACGACAAAAGTTAGTTTTGGTCCCGAATTCTGAGCATAAT
CCGGCTACAACTAAAGAATTAACTGCTCGCGA
And for proteins, you can rely on a similar action:
xdget -p globins HBP_CANLI
You should see the following output:
>HBP_CANLI P42511 Leghemoglobin.
MGAFSEKQESLVKSSWEAFKQNVPHHSAVFYTLILEKAPAAQNMFSFLSNGVDPNNPKLK
AHAEKVFKMTVDSAVQLRAKGEVVLADPTLGSVHVQKGVLDPHFLVVKEALLKTFKEAVG
DKWNDELGNAWEVAYDELAAAIKKAMGSA
nrdb and patdb
The nrdb and patdb programs are useful for removing the redundant sequences you
saw earlier. Use nrdb first:
nrdb globins > globins_nr
The additional output from the program is as follows:
--------- Records --------- -------------- Residues -----------
Database Read Duplicate Written Read Duplicate Written
globins 1203 39 1164 211,084 37,819 173,265
Totals: 1203 39 1164 211,084 37,819 173,265
No. of base word hits: 53 (53 total)
No. of 32-bit hash hits: 39
Total memory allocated: 0.500 MB
Longest comment line ...