O'Reilly logo

Bioinformatics with Python Cookbook by Tiago Antao

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Extracting genes from a reference using annotations

In this recipe, we will see how to extract a gene sequence with the help of an annotation file to get its coordinates against a reference FASTA. We will use the Anopheles gambiae genome, along with its annotation file (as per the previous two recipes). We will first extract the Voltage-gated sodium channel (VGSC) gene, which is involved in resistance to insecticides.

Getting ready

If you have followed the previous two recipes, you are ready. If not, download the Anopheles gambiae FASTA file, along with the GTF file. You also need to prepare the gffutils database:

import gffutils import sqlite3 try: db = gffutils.create_db('gambiae.gff.gz', 'ag.db') except sqlite3.OperationalError: db = gffutils.FeatureDB('ag.db') ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required