June 2008
Beginner to intermediate
417 pages
10h 41m
English
Large databases of DNA information are being collected by several institutes. In the United States, a large repository is Genbank, which is under the sponsorship of the National Institutes of Health (http://www.ncbi.nlm.nih.gov/Genbank/index.html). The concern of this chapter is to develop programs capable of reading the files that are stored in three of the most popular formats: FASTA, Genbank, and ASN.1.
The FASTA format is extremely simple, but it contains very little information aside from the sequence. A typical FASTA format is shown in Figure 6-1.
The first line contains a small header that may vary in content. In this case, the accession number and name of species and chromosome number are given. ...