Take a look at the following steps:
- Let's get the metadata for our samples. We will load the population of each sample and note all individuals that are offspring of others in the dataset:
from collections import defaultdictf = open('relationships_w_pops_121708.txt')pop_ind = defaultdict(list)f.readline() # headeroffspring = []for l in f: toks = l.rstrip().split('\t') fam_id = toks[0] ind_id = toks[1] mom = toks[2] dad = toks[3] if mom != '0' or dad != '0': offspring.append((fam_id, ind_id)) pop = toks[-1]pop_ind[pop].append((fam_id, ind_id))f.close()
This will load a dictionary where population is the key (CEU, YRI, and so on) and its value is the list of individuals in that population. This dictionary will also store information ...