22

STRUCTURAL ANNOTATION OF GENOMES

Adam J. Reid, Corin Yeats, Jonathan Lees, and Christine A. Orengo

INTRODUCTION

Physical techniques such as X-ray crystallography and NMR allow the determination of individual protein structures on the atomic level. Such information is invaluable for detailed studies of protein function. However, directly determining the structure of proteins by physical methods is laborious and expensive. Although there are structures for ~8000 different proteins in the Protein Data Bank (http://www.rcsb.org/pdb/) as of June 2007, there are currently ~5 million protein sequences in UniProt (http://www.ebi.uniprot.org) and the number of sequences grows at a faster rate than that of solved structures.

Therefore, time and cost constraints make it impossible to directly determine the structures of proteins in all genomes with current technology. Fortunately however, homology modeling allows reasonably accurate prediction of structure for sequences with >40% sequence identity over their whole length (Marti-Renom et al., 2000). In addition, fold recognition techniques allow structures to be predicted at much lower levels of sequence similarity and this can often give helpful insights into protein functions. Consequently, only a sampling of structures needs to be solved, although it is not simple to determine how this should be done (see Section ’’Can We Determine All the Structures Present in the Genomes?—Structural Annotation of Genomes and Structural Genomics’’). ...

Get Structural Bioinformatics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.