CHAPTER 22

PROTEIN FUNCTION PREDICTION WITH DATA-MINING TECHNIQUES

Xing-Ming Zhao and Luonan Chen

22.1 INTRODUCTION

One of the most challenging problems in the postgenomic era is to annotate uncharacterized proteins with biological functions. In past decades, a huge amount of protein sequences were accumulated in public databases. However, the pace at which proteins are annotated is far behind the one at which protein sequences accumulate.

Currently, about 25% of genes remain uncharacterized for the well-studied Saccharomyces cerevisiae, whereas only about 20% of genes are not annotated for Homo sapiens. It would be time consuming and expensive to determine the functions of all proteins in a lab. Computational biology that uses data mining techniques provides an alternative way to predict functions of proteins based on their sequences, structures, gene expression profiles, and so on. For instance, a straightforward way is to apply PSI-blast [1] and FASTA [63] to find homologous proteins and transfer their annotations to the target protein in which the proteins with similar sequences are assumed to carry out similar functions. However, the alignment-based methods may not work well when the sequence similarity between known proteins and the query protein is very low (e.g., below 30%). Under the circumstances, the alignment-free methods provide an alternative solution to this problem by using data-mining techniques [37, 47, 90, 98], in which the alignment-free methods can detect ...

Get Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.