How to do it...

Take a look at the following steps:

  1. We will start by doing the necessary imports and checking Dask's version:
from multiprocessing.pool import Poolfrom math import ceilimport numpy as npimport h5pyimport daskimport dask.array as daimport dask.multiprocessingprint(dask.__version__)

Make sure that Dask is at least version 0.19.2, as we will be using fairly recent features.

  1. Now, load some HDF5 data for processing:
h5_3L = h5py.File('ag1000g.phase1.ar3.pass.3L.h5', 'r')samples = h5_3L['/3L/samples']positions = h5_3L['/3L/variants/POS']num_samples = len(samples)del samples

While this recipe is a Dask version of the previous one, there will be slight differences imposed by the Dask programming model. At this stage, notice that ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.