Take a look at the following steps:
- Let's load the metadata (we will use a simplified version from the previous recipe) as follows:
from collections import defaultdictf = open('relationships_w_pops_121708.txt')pop_ind = defaultdict(list)f.readline() # headerfor line in f: toks = line.rstrip().split('\t') fam_id = toks[0] ind_id = toks[1] pop = toks[-1] pop_ind[pop].append((fam_id, ind_id))f.close()
- Let's check for consistency between the PLINK data file and the metadata, as we will need to clean up population mappings to generate a Genepop file, as shown in the following code:
all_inds = []for inds in pop_ind.values(): all_inds.extend(inds)for line in open('hapmap1.ped'): toks = line.rstrip().replace(' ', '\t').split('\t') ...