IDENTIFYING STRUCTURAL DOMAINS IN PROTEINS
Analysis of protein structures typically begins with decomposition of the structure into more basic units called structural domains. The underlying goal is to reduce a complex protein structure to a set of simpler, yet structurally meaningful units, each of which can be analyzed independently. Structural semi-independence of domains is their hallmark: domains often have compact structure that can fold (and sometimes function) independently. The total number of distinct structural domains is currently hovering around one thousand: they are represented by the unique folds in SCOP classification (Murzin et al., 1995) or unique topologies in CATH classification (Orengo et al., 1997). Interestingly, this is what Chothia predicted at a rather early stage of the Structural Genomics era (Chothia, 1992).
A significant fraction of these domains is universal to all life forms, others are kingdom-specific and yet others are confined to subgroups of species (Ponting and Russell, 2002;Yang, and Doolittle, and Bourne, 2005). The enormous variety of protein structures is then achieved through combination of various domains within a single structure. This “combining” of domains can be achieved by combining together single domain polypeptide chains within a noncovalently linked structure or by combining domains (via gene fusions/ recombination) on a single polypeptide chain that folds into the ...