Alexandros Stamatakis


The reconstruction of phylogenetic (evolutionary) trees from molecular sequence data is a comparatively old problem in bioinformatics, given that Joe Felsenstein’s seminal paper [12] on computing the maximum likelihood (ML) score on trees already was published in 1981 (i.e., almost 30 years ago). However, significant advances in wet lab molecular sequencing technologies, such as, for instance, the introduction of 454 sequencers [53], are generating a highly challenging and unprecedented molecular data flood. Although a plethora of challenging problems exist regarding the implementation, numerical stability, and parallelization of the phylogenetic likelihood function (PLF) on emerging parallel architectures, the design of new search algorithms is equally challenging and can even yield more impressive performance improvements than high-performance computing (HPC) methods.

The PLF typically consumes more than 95% of overall execution time in all state-of-the art likelihood-based phylogeny programs. Algorithmic design for phylogeny reconstruction therefore can be regarded as trying to minimize the number of invocations of the PLF while maximizing the likelihood score of the tree that is returned by the search algorithm. Ideally, algorithmic design and HPC implementations of the PLF should be conducted simultaneously because specific heuristics may require dedicated or appropriately ...

Get Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.