GENSCAN, an HMM algorithm‐based online program, is used to identify complete gene structures in genomic DNA, and to predict the location of genes and their exon–intron boundaries in genomic sequences of vertebrates, Arabidopsis and maize. GENSCAN was developed by Christopher Burge of the Department of Mathematics, Stanford University (Burge and Karlin, 1997; Burge, 1998).
To predict the putative gene sequence(s) in a given input nucleotide sequence and annotate the sequence.
- Download a sequence (fewer than 1 million base pairs) from NCBI Nucleotide, and save in Notepad in FASTA format: here, chromosome 1 (CM000409.1) sequence of duck‐billed platypus (Ornithorhynchus anatinus) has been downloaded from NCBI (http://www.ncbi.nlm.nih.gov/nuccore/CM000409.1).
- The original sequence is more than 1 megabase in size, so it needs to be trimmed from any termini to approximately 1 megabase in size (using Notepad ++). The user needs to subject the input sequence to repeat‐masker to remove low‐complexity, repeat regions in the input sequence.
- Open the GENSCAN web server: http://genes.mit.edu/GENSCAN.html.
- Set the parameters:
- Organism: select the appropriate option from “Vertebrate”, “Arabidopsis”, or “Maize”, available in the drop‐down options with “Organism”. Here, we will select “Vertebrate”.
- Suboptimal exon cutoff ...