CHAPTER 21

APPROACHES AND METHODS FOR OPERON PREDICTION BASED ON MACHINE LEARNING TECHNIQUES

Yan Wang, You Zhou, Chunguang Zhou, Shuqin Wang, Wei Du, Chen Zhang and Yanchun Liang

21.1 INTRODUCTION

The concept of operon appeared for the first time in the theory about a protein regulatory mechanism proposed by Jacob and Monod. An operon represents a basic transcriptional unit of genes in the complex biological processes of microbial genomes [1]. Therefore, operon prediction is one of the most fundamental and important research fields in microbial genomics [2].

Generally, an operon is a cluster of one or more tandem genes delimited by a promoter and a terminator, and its structure is shown in Figure 21.1. They usually have most of the same properties [3], which are very useful to identify an operon:

1. An operon consists of one or more genes on the same strand of a genomic sequence.

2. Intergenic distances within an operon are generally shorter than the distances of genes pairs without operons.

3. Generally, several genes of an operon have a common promoter and terminator, but the regions within an operon usually do not contain any promoter or terminator.

4. Genes in an operon usually have related functions, and most belong to the same functional category, such as a cluster of orthologus groups (COG) [4].

5. The genes in an operon as a functional unit tend to be found in more conserved gene pairs and more similar phylogenetic profiles.

Figure 21.1 The structure of an operon: g2, ...

Get Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.