Manolis Christodoulakis and Costas S. Iliopoulos


It is a well-known fact, referred to as “the first fact of biological sequence analysis” [21], that biological sequences that are similar to each other tend to have similar two- or three-dimensional structure and/or perform similar biological functions. This fact indeed is used to infer the function of a given gene or protein by finding similar sequences whose functionality already is known [35].

One of the most fundamental tools for visualising similarity between two sequences is the string alignment. Numerous pairwise alignment methods exist, including the dot matrix analysis [19], various forms of dynamic programming—(e.g., the local alignment Smith–Waterman algorithm [40], and the Needleman–Wunsch global alignment algorithm [34]) as well as heuristic methods (e.g., FASTA [30, 36] and BLAST [2]).

Although the importance of pairwise sequence alignment cannot be overstated, it seems that aligning more than two sequences concurrently can be even more helpful in identifying similarities. Subsequences that are conserved among all (or, most) sequences and, therefore, possibly characterize all sequences at hand are easier to identify. Similar to pairwise alignments, several types of multiple alignments also exist, like global alignments (e.g., ClustalW [42]) and local alignments (e.g., Dialign [33]). Figure 8.1a, shows a small portion of the alignment of a set of ...

Get Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.