9

Building workflows that traversethe bioinformaticsdata landscape

Robert Stevens, Paul Fisher, Jun Zhao, Carole Goble and Andy Brass

ABSTRACT

The bioinformatics data landscape confronts scientists with significant problems when performing data analyses. The nature of these analyses is, in part, driven by the data landscape. This raises issues in managing the scientific process of in silico experimentation in bioinformatics. The myGrid project has addressed these issues through workflows. Although raising some issues of their own, workflows have allowed scientists to effectively traverse the bioinformatics landscape. The high-throughput nature of workflows, however, has forced us to move from a task of data gathering to data gathering and management. Utilizing workflows in this manner has enabled a systematic, unbiased, and explicit approach that is less susceptible to premature triage. This has profoundly changed the nature of bioinformatics analysis. Taverna is illustrated through an example from the study of trypanosomiasis resistance in the mouse model. In this study novel biological results were obtained from traversing the bioinformatics landscape with workflow.

9.1 Introduction

This chapter describes the Taverna workflow workbench (Oinn et al., 2006), developed under the myGrid project1 and discusses its role in building workflows that are used in bioinformatics.2 analyses. Taverna is an application that allows a bioinformatician to describe the flow of data between a series ...

Get Data Mining Techniques in Grid Computing Environments now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.