Chapter 10. DATABASES

An increasing amount of scientific data, information and literature is available on the internet in the form of scientific databases. Unlike their industrial counterparts, on-line scientific databases are typically exposed either as repositories of XML data or in the form of remote procedure calls (RPCs) that can be invoked over the internet as web services.

Of the scientific disciplines, the life sciences are currently by far the most advanced in terms of on-line databases. This is primarily due to the explosion in bioinformatics-related data, most notably DNA and protein sequences, that may now be interrogated over the internet.

The .NET framework was specifically designed to cater for the requirements of on-line databases, XML data and web services. Consequently, on-line scientific databases can be accessed quickly and efficiently from F# programs by leveraging the .NET framework. This chapter describes the use of existing .NET functionality to interrogate two of the most important scientific databases in the life sciences: the Protein Data Bank (PDB) and GenBank.

PROTEIN DATA BANK

The PDB provides a variety of tools and resources for studying the structures of bi-ological macromolecules and their relationships to sequence, function, and disease. The PDB is maintained by the Research Collaboratory for Structural Bioinformat-ics (RCSB), a non-profit consortium dedicated to improving the understanding of biological systems function through the study of the 3-D ...

Get F# for Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.