CHAPTER 12
ALGORITHMS FOR THE ALIGNMENT OF BIOLOGICAL SEQUENCES
12.1 INTRODUCTION
Bioinformatics is a science dedicated to the automatic processing of information related to biological macromolecules (i.e., DNA, RNA, and proteins). These macromolecules are coded by strings called biological sequences. Every character in a string codes a constituent of the macromolecule. DNA, RNA, and proteins can be coded by sequences in which every character is in {A, T, C, G}, {A, U, C, G}, and {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}, respectively. Among the most studied problems in bioinformatics is the comparison of biological sequences in order to identify similar substrings, occuring in the same order, in these sequences. This operation makes a very important contribution in the analysis of biological macromolecules. In fact, it can reveal information about shared functions of biological macromolecules, coming from several different organisms, by the identification of regions that are shared by the sequences coding these macromolecules. These regions, which have been conserved during evolution, often play an important structural or functional role and, consequently, shed light on the mechanisms and the biologic processes in which these macromolecules participate. In addition, the comparison of biological sequences permits the detection of functional regions. It is also used in evolutionary studies to analyze relationships that exist ...