November 2018
Intermediate to advanced
360 pages
9h 36m
English
Take a look at the following steps:
from math import ceilimport numpy as npimport h5pyh5_3L = h5py.File('ag1000g.phase1.ar3.pass.3L.h5', 'r')samples = h5_3L['/3L/samples']calldata_genotype = h5_3L['/3L/calldata/genotype']positions = h5_3L['/3L/variants/POS']alt_alleles = h5_3L['/3L/variants/ALT']is_snp = h5_3L['/3L/variants/is_snp']num_samples = len(samples)
There are alternatives to h5py, but be careful as they might impose constraints on keys and data (for instance, the read methods of pandas might do this). While we are referring to the objects, they are not being loaded in memory.