Using Parallelization Programs, such as GNU Parallel and OpenMP, with Serial Tools
The goal of this document is to provide several examples and methods to program and use parallel logic to process multiple data sets using multiple cores on one or more servers. The document does not cover message passing interface (MPI) batch processing invoking multiple nodes sharing a single processing job.
Basic knowledge of shell scripting is helpful but not absolutely necessary.
This page is helpful for beginners: http://linuxcommand.org/lc3_wss0020.php.
Key terms used in these scripts:
- For loops
- GNU Parallel (http://www.gnu.org/software/parallel/)
Next-generation sequencing (NGS) tools used:
- Picard-tools (http://broadinstitute.github.io/picard/)
- Burrows-Wheeler Aligner (http://sourceforge.net/projects/bio-bwa/)
- Plink (http://pngu.mgh.harvard.edu/~purcell/plink/)
HPC Resource Manager:
BIO HPC Use Case 1
Biologist AF receives 24 Binary Alignment Map (BAM) files from a third-party lab. AF uses Picard’s samtools program to index these BAM files, but the index is corrupt and unusable. AF contacts the lab and discovers the files received were not correctly processed (aligned, sorted, read groups added, etc.).
Process BAM data using NGS tools. ...