10
THE PDB FORMAT, mmCIF FORMATS, AND OTHER DATA FORMATS
INTRODUCTION
In this chapter, the data formats and protocols used to represent primary macromolecular structure data are presented. The historical format used by the Protein Data Bank (PDB) is described first. Dictionary-based representations such as the macromolecular Crystallo-graphic Information File (mmCIF) and extensions such as the PDB Exchange dictionary are presented. The translation of the PDB Exchange dictionary into XML or PDB Markup Language (PDBML) is also described. Finally, protocols that provide data access through application program interfaces are described.
THE PDB FORMAT
The Protein Data Bank (http://www.pdb.org/; see also Chapter 11) (Bernstein et al., 1977; Berman et al., 2000) was established in 1971 by Walter Hamilton at Brookhaven National Laboratory, in response to community requests for a central repository of information on biological macromolecular structures. Seven structures were included in the PDB at its inception. The essential elements of the format used to encode these first entries are still the core of the PDB format used today. Because of the simplicity of the format and its consistency in representing three-dimensional structures, the PDB format remains the most widely supported means of exchanging macromolecular structure data.
The PDB format consists of a collection of fixed format records that describe the atomic coordinates, chemical ...