Skip to Content
Bioinformatics with Python Cookbook - Second Edition
book

Bioinformatics with Python Cookbook - Second Edition

by Tiago Antao
November 2018
Intermediate to advanced
360 pages
9h 36m
English
Packt Publishing
Content preview from Bioinformatics with Python Cookbook - Second Edition

Using high-performance data formats – HDF5

VCF processing is very slow: if you do an empty for loop over a big VCF file, it can easily take days just to parse it. This is because text parsing is very demanding. Alternatively, using NumPy arrays is fast, but you are limited to whatever fits in memory. There are several alternatives to deal with both of these problems (and we will explore more than one in this chapter). Here, we will consider representing our data in HDF5 format.

We will use an existing HDF5 file that was exported from a VCF file and do some basic extraction of data.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Bioinformatics with Python Cookbook

Bioinformatics with Python Cookbook

Tiago Antao

Publisher Resources

ISBN: 9781789344691Supplemental Content