Let's take a look at the following steps:
- Let's start by listing the chromosomes of the A. gambiae genome:
import gzipfrom Bio import SeqIOgambiae_name = 'gambiae.fa.gz'atroparvus_name = 'atroparvus.fa.gz'recs = SeqIO.parse(gzip.open(gambiae_name, 'rt', encoding='utf-8'), 'fasta')for rec in recs: print(rec.description)
This will produce the following output:
chromosome:AgamP3:2L:1:49364325:1 chromosome 2Lchromosome:AgamP3:2R:1:61545105:1 chromosome 2Rchromosome:AgamP3:3L:1:41963435:1 chromosome 3Lchromosome:AgamP3:3R:1:53200684:1 chromosome 3Rchromosome:AgamP3:UNKN:1:42389979:1 chromosome UNKNchromosome:AgamP3:X:1:24393108:1 chromosome Xchromosome:AgamP3:Y_unplaced:1:237045:1 chromosome Y_unplaced
The code is quite straightforward. ...