CHAPTER 21
APPROACHES AND METHODS FOR OPERON PREDICTION BASED ON MACHINE LEARNING TECHNIQUES
21.1 INTRODUCTION
The concept of operon appeared for the first time in the theory about a protein regulatory mechanism proposed by Jacob and Monod. An operon represents a basic transcriptional unit of genes in the complex biological processes of microbial genomes [1]. Therefore, operon prediction is one of the most fundamental and important research fields in microbial genomics [2].
Generally, an operon is a cluster of one or more tandem genes delimited by a promoter and a terminator, and its structure is shown in Figure 21.1. They usually have most of the same properties [3], which are very useful to identify an operon:
1. An operon consists of one or more genes on the same strand of a genomic sequence.
2. Intergenic distances within an operon are generally shorter than the distances of genes pairs without operons.
3. Generally, several genes of an operon have a common promoter and terminator, but the regions within an operon usually do not contain any promoter or terminator.
4. Genes in an operon usually have related functions, and most belong to the same functional category, such as a cluster of orthologus groups (COG) [4].
5. The genes in an operon as a functional unit tend to be found in more conserved gene pairs and more similar phylogenetic profiles.