How to do it...

Before you start coding, note that you can inspect the BAM file using samtools view -h (this is if you have SAMtools installed, which we recommend, even if you use the Genome Analysis Toolkit (GATK) or something else for variant calling). We suggest that you take a look at the header file and the first few records. The SAM format is too complex to be described here. There is plenty of information on the internet about it; nonetheless, sometimes, there's some really interesting information buried in these header files.

One of the most complex operations in NGS is to generate good alignment files from raw sequence data. It not only calls the aligner, but also cleans up data. Now, in the @PG headers of high quality BAM files, ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.