CHAPTER 16

NOVEL COMBINATORIAL AND INFORMATION-THEORETIC ALIGNMENT-FREE DISTANCES BIOLOGICAL DATA MINING

Chiara Epifanio, Alessandra Gabriele, Raffaele Giancarlo, and Marinella Sciortino

16.1 INTRODUCTION

Sequence comparison plays a central role in many scientific areas [70] and, in particular, for computational molecular biology [41, 81]. In fact, biological molecules can be seen as linear sequences of discrete units similar to linguistic representations, despite their physical nature as a three-dimensional (3D) structure and the dynamic nature of molecular evolution.

The problem of defining good mathematical functions to measure similarity between biological sequences is fundamental for their analysis. Indeed, a high similarity among biological sequences, as measured by mathematical functions, usually gives a strong indication of functional relatedness and/or common ancestry [41]. However, there is not a bijective correspondence between sequence and structure or sequence and function because, in general, the converse of the previous statement is not true.

Classically, the notions of similarity and distance hinge on sequence alignment [41]. Algorithmically, these methods usually are implemented by using dynamic programming and target specific goals such as global and local alignments, with many of their variants (e.g., [40, 41, 60, 72). Because of their worst case superlinear running time in the input parameters, these methods are considered inadequate for the analysis of long ...

Get Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.