CHAPTER 18
MOTIF FINDING ALGORITHMS IN BIOLOGICAL SEQUENCES
18.1 INTRODUCTION
The motif finding problem consists in finding substrings that are more or less conserved in a set of strings. This problem is a fundamental one in both computer science and molecular biology. Indeed, when the concerned strings code biological sequences (i.e., DNA, RNA, and proteins), extracted motifs offer biologists many tracks to explore and help them to deal with challenging problems. Actually, a motif generally represents an expression that characterizes a set of biological sequences [4,13]. It generally codes a substructure and/or a biological function [8]. And hence, on the one hand, a motif can help biologists to learn about the biological functions of biological sequences and, consequently, can help them to understand the mechanisms of the biological processes in which these sequences are involved. On the other hand, a motif common to a set of biological sequences can help biologists to determine common biological ancestors to these biological sequences [14].
Despite many efforts, the motif finding problem remains a challenge for both computer scientists and biologists. Indeed, on the one hand, the general version of this problem is Nondeterministic Polynomial (NP)-hard [11], and on the other hand, our incomplete and fuzzy understanding of several biological mechanisms does not help us to provide good models for this problem.
In the literature, ...