Extracting genes from a reference using annotations

In this recipe, we will see how to extract a gene sequence with the help of an annotation file to get its coordinates against a reference FASTA. We will use the Anopheles gambiae genome, along with its annotation file (as per the previous two recipes). We will first extract the Voltage-gated sodium channel (VGSC) gene, which is involved in resistance to insecticides.

Getting ready

If you have followed the previous two recipes, you are ready. If not, download the Anopheles gambiae FASTA file, along with the GTF file. You also need to prepare the gffutils database:

import gffutils import sqlite3 try: db = gffutils.create_db('gambiae.gff.gz', 'ag.db') except sqlite3.OperationalError: db = gffutils.FeatureDB('ag.db') ...

Get Bioinformatics with Python Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.