O'Reilly logo

Bioinformatics with Python Cookbook by Tiago Antao

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Analyzing data in the variant call format

After running a genotype caller (for example, GATK or samtools), you will have a variant call format (VCF) file reporting on genomic variations, such as single-nucleotide polymorphisms (SNPs), Insertions/Deletions (INDELs), copy number variation (CNVs), and so on. In this recipe, we will discuss VCF processing with the PyVCF module.

Getting ready

While next-generation sequencing is all about big data, there is a limit to how much I can ask you to download as a dataset for this book. I believe that 2 to 20 GB of data for a tutorial is asking too much. While the 1000 genomes' VCF files with realistic annotations are in this order of magnitude, we will want to work with much less data here. Fortunately, the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required