CHAPTER 37Genome Annotation in Eukaryotes
CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
37.1 INTRODUCTION
GENSCAN, an HMM algorithm‐based online program, is used to identify complete gene structures in genomic DNA, and to predict the location of genes and their exon–intron boundaries in genomic sequences of vertebrates, Arabidopsis and maize. GENSCAN was developed by Christopher Burge of the Department of Mathematics, Stanford University (Burge and Karlin, 1997; Burge, 1998).
37.2 OBJECTIVE
To predict the putative gene sequence(s) in a given input nucleotide sequence and annotate the sequence.
37.3 PROCEDURE
- Download a sequence (fewer than 1 million base pairs) from NCBI Nucleotide, and save in Notepad in FASTA format: here, chromosome 1 (CM000409.1) sequence of duck‐billed platypus (Ornithorhynchus anatinus) has been downloaded from NCBI (http://www.ncbi.nlm.nih.gov/nuccore/CM000409.1).
- The original sequence is more than 1 megabase in size, so it needs to be trimmed from any termini to approximately 1 megabase in size (using Notepad ++). The user needs to subject the input sequence to repeat‐masker to remove low‐complexity, repeat regions in the input sequence.
- Open the GENSCAN web server: http://genes.mit.edu/GENSCAN.html.
- Set the parameters:
- Organism: select the appropriate option from “Vertebrate”, “Arabidopsis”, or “Maize”, available in the drop‐down options with “Organism”. Here, we will select “Vertebrate”.
- Suboptimal exon cutoff ...
Get Basic Applied Bioinformatics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.