After downloading the data, follow these steps:
- First, start with a few imports:
import pickleimport gzipimport randomimport numpy as npimport h5pyimport pandas as pd
- Let's get the sample metadata:
samples = pd.read_csv('samples.tsv', sep='\t')print(len(samples))print(samples['cross'].unique())print(samples[samples['cross'] == 'cross-29-2'][['id', 'function']])print(len(samples[samples['cross'] == 'cross-29-2']))print(samples[samples['function'] == 'parent'])
We also print some basic information about the cross we are going to use, and all the parents.
- We prepare to deal with chromosome arm 3L based on its HDF5 file:
h5_3L = h5py.File('ag1000g.crosses.phase1.ar3sites.3L.h5', 'r')samples_hdf5 = list(map(lambda sample: sample.decode('utf-8'), ...