Skip to Content
Python for Bioinformatics
book

Python for Bioinformatics

by Jason Kinser
June 2008
Beginner to intermediate
417 pages
10h 41m
English
Jones & Bartlett Learning
Content preview from Python for Bioinformatics

6 Parsing DNA Data Files

Large databases of DNA information are being collected by several institutes. In the United States, a large repository is Genbank, which is under the sponsorship of the National Institutes of Health (http://www.ncbi.nlm.nih.gov/Genbank/index.html). The concern of this chapter is to develop programs capable of reading the files that are stored in three of the most popular formats: FASTA, Genbank, and ASN.1.

6.1 FASTA Files

The FASTA format is extremely simple, but it contains very little information aside from the sequence. A typical FASTA format is shown in Figure 6-1.

The first line contains a small header that may vary in content. In this case, the accession number and name of species and chromosome number are given. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Bioinformatics with Python Cookbook - Third Edition

Bioinformatics with Python Cookbook - Third Edition

Tiago Antao

Publisher Resources

ISBN: 9780763751869