CHAPTER 5
EXACT SEARCH ALGORITHMS FOR BIOLOGICAL SEQUENCES
5.1 INTRODUCTION
With the development of sequencing techniques, it has become easy to obtain the sequence (i.e., the linear arrangement of residues [nucleotides or amino-acids]), of DNA, RNA, or protein molecules. However, determining the function of a molecule remains difficult and is often bound to finding a sequence similarity to another molecule whose role in the cell is at least partially known. Then the biologist can predict that both molecules share the same function and try to check this experimentally. Functional annotations are transferred from one sequence to another provided that their similarity is high enough. This procedure is also applied to molecule subparts, whose sequences are shorter; such as protein domains, DNA/RNA motifs, and so on.
Depending on the sequence lengths and the expected level of evolutionary relatedness, the sequence similarity can be found using alignment or pattern matching procedures. A quest in bioinformatics has been to design more sensitive sequence similarity searching methods to push further the limit or gray zone at which evolutionary sequence similarity cannot be departed from random sequence similarity [4,21]. These methods (e.g., profile hidden Markov Models) have provided, at the expense of computing time, important improvements in functional annotations. However, it has soon become clear that in other frameworks, only high-level ...