CHAPTER 37Genome Annotation in Eukaryotes

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

37.1 INTRODUCTION

GENSCAN, an HMM algorithm‐based online program, is used to identify complete gene structures in genomic DNA, and to predict the location of genes and their exon–intron boundaries in genomic sequences of vertebrates, Arabidopsis and maize. GENSCAN was developed by Christopher Burge of the Department of Mathematics, Stanford University (Burge and Karlin, 1997; Burge, 1998).

37.2 OBJECTIVE

To predict the putative gene sequence(s) in a given input nucleotide sequence and annotate the sequence.

37.3 PROCEDURE

  1. Download a sequence (fewer than 1 million base pairs) from NCBI Nucleotide, and save in Notepad in FASTA format: here, chromosome 1 (CM000409.1) sequence of duck‐billed platypus (Ornithorhynchus anatinus) has been downloaded from NCBI (http://www.ncbi.nlm.nih.gov/nuccore/CM000409.1).
  2. The original sequence is more than 1 megabase in size, so it needs to be trimmed from any termini to approximately 1 megabase in size (using Notepad ++). The user needs to subject the input sequence to repeat‐masker to remove low‐complexity, repeat regions in the input sequence.
  3. Open the GENSCAN web server: http://genes.mit.edu/GENSCAN.html.
  4. Set the parameters:
    1. Organism: select the appropriate option from “Vertebrate”, “Arabidopsis”, or “Maize”, available in the drop‐down options with “Organism”. Here, we will select “Vertebrate”.
    2. Suboptimal exon cutoff ...

Get Basic Applied Bioinformatics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.