Tarek El Falah, Mourad Elloumi, and Thierry Lecroq


The motif finding problem consists in finding substrings that are more or less conserved in a set of strings. This problem is a fundamental one in both computer science and molecular biology. Indeed, when the concerned strings code biological sequences (i.e., DNA, RNA, and proteins), extracted motifs offer biologists many tracks to explore and help them to deal with challenging problems. Actually, a motif generally represents an expression that characterizes a set of biological sequences [4,13]. It generally codes a substructure and/or a biological function [8]. And hence, on the one hand, a motif can help biologists to learn about the biological functions of biological sequences and, consequently, can help them to understand the mechanisms of the biological processes in which these sequences are involved. On the other hand, a motif common to a set of biological sequences can help biologists to determine common biological ancestors to these biological sequences [14].

Despite many efforts, the motif finding problem remains a challenge for both computer scientists and biologists. Indeed, on the one hand, the general version of this problem is Nondeterministic Polynomial (NP)-hard [11], and on the other hand, our incomplete and fuzzy understanding of several biological mechanisms does not help us to provide good models for this problem.

In the literature, ...

Get Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.