CHAPTER 7

KERNELS ON PROTEIN STRUCTURES

SOURANGSHU BHATTACHARYA, CHIRANJIB BHATTACHARYYA, AND NAGASUMA R. CHANDRA

7.1 INTRODUCTION

Kernel methods have emerged as one of the most powerful techniques for supervised, as well as semisupervised learning [1], and with structured data. Kernels have been designed on various types of structured data including sets [2–4], strings [5], probability models [7,8], and so on. Protein structures are another important type of structured data [9,10], which can be modelled as geometric structures or pointsets [9,10]. This chapter concentrates on designing kernels on protein structures [7,11].

Classification of protein structures into different structural classes is a problem of fundamental interest in computational biology. Many hierarchical structural classification databases, describing various levels of structural similarities, have been developed. Two of the most popular databases are structural classification of proteins (SCOP) [12] a manually curated database, and class, architecture, topology and homologous superfamilies (CATH) [13], which is a semiautomatically curated database. Since the defining characteristics of each of these classes is not known precisely, machine learning is the tool of choice for attempting to classify proteins automatically. We propose to use the kernels designed here along with support vector machines to design automatic classifiers on protein structures.

Many kernels have been designed to capture similarities between ...

Get Computational Intelligence and Pattern Analysis in Biological Informatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.