9

Bioinformatics Pipelines

Pipelines are fundamental within any data science environment. Data processing is never a single task. Many pipelines are implemented via ad hoc scripts. This can be done in a useful way, but in many cases, they fail several fundamental viewpoints, chiefly reproducibility, maintainability, and extensibility.

In bioinformatics, you can find three main types of pipeline system:

  • Frameworks such as Galaxy (https://usegalaxy.org), which are geared toward users, that is, they expose easy-to-use user interfaces and hide most of the underlying machinery.
  • Programmatic workflows – geared toward code interfaces that, while generic, originate from the bioinformatics space. Two examples are Snakemake (https://snakemake.readthedocs.io/ ...

Get Bioinformatics with Python Cookbook - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.