Structural Bioinformatics, 2nd Edition

THE CATH DOMAIN STRUCTURE DATABASE

Frances M. G. Pearl, Alison Cuff, and Christine A. Orengo

INTRODUCTION

Protein sequences change during evolution due to both mutations in their residues and the insertion and deletion of residues. These changes give rise to families of related proteins. The earliest protein family resources were first established in the 1970s by the pioneering work of Dayhoff and many other sequence databases have been established since then. These resources are derived solely from sequence data and relationships are often detected using alignment methods based on powerful dynamic programming algorithms adapted from the realm of computer science. Such methods very efficiently handle residue insertions and deletions occurring between distant evolutionary relatives.

Structural data have always been sparser than the sequence data due to the technical challenges of structure determination. There is currently over two orders of magnitude discrepancy between the sequence and structure resources. Thus, while the Protein Data Bank (PDB) contains about 42,500 structural entries, the sequence databank at the NCBI (GenBank) contains over 60 million entries.

Although the first crystal structures were solved in the early 1970s, it was not until the mid-1990s that structural classifications began to emerge, primarily with SCOP (Murzin et al., 1995; Andreeva et al., 2004), DALI (Holm and Sander, 1996), and CATH (Orengo et al., 1997; Greene et al., 2007) databases and data ...

Get Structural Bioinformatics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Structural Bioinformatics, 2nd Edition by Jenny Gu, Philip E. Bourne

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly