Basic Applied Bioinformatics
by Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, Mir Asif Iquebal
CHAPTER 5Sequence Format Conversion
CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
5.1 INTRODUCTION
A computer file format is a distinct way of encoding data to store in a file. Biological sequence format is an assemblage of distinct file formats, with the aim of rendering the files legible to specific programs.
Note: Biological sequences are generally written in Courier New font. This enables us to arrange the sequences uniformly in each line of the text
Sequence formats are manipulated or inter‐converted by the system in the base level through ASCII (American Standard Code for Information Interchange – i.e. binary code) text – that is, A–Z characters are encoded by 65–90; a–z characters by 97–122. Thus, the sequence formats are the required arrangement of characters, symbols, and keywords that specify the sequence, ID name, comments, and so on.
The sequence formats are needed for two purposes:
- Different programs recognize different types of formats. We need to convert one format to an other to use the sequence for that program.
- Presentations of the molecular sequence are sometimes required in a particular format.
Commonly used sequence formats.
| 1. IG/Stanford | 7. Fitch | 13. Plain/Raw |
| 2. GenBank/GB | 8. Pearson/Fasta | 14. PIR/CODATA |
| 3. NBRF | 9. Zuker (in‐only) | 15. MSF |
| 4. EMBL | 10. Olsen (in‐only) | 16. ASN.1 |
| 5. GCG | 11. Phylip3.2 | 17. PAUP |
| 6. DNAStrider | 12. Phylip | 18. Pretty (out‐only) |
5.2 OBJECTIVE
To convert the format of a given molecular sequence to other ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access