Take a look at the following steps:
- Let's load the metadata, as follows:
f = open('relationships_w_pops_121708.txt')ind_pop = {}f.readline() # headerfor l in f: toks = l.rstrip().split('\t') fam_id = toks[0] ind_id = toks[1] pop = toks[-1] ind_pop['/'.join([fam_id, ind_id])] = popf.close()ind_pop['2469/NA20281'] = ind_pop['2805/NA20281']
In this case, we will add an entry that is consistent with what is available in the PLINK file.
- Let's convert the PLINK file into the EIGENSOFT format:
from genomics.popgen.plink.convert import to_eigento_eigen('hapmap10_auto_noofs_ld_12', 'hapmap10_auto_noofs_ld_12')
This uses a function that I have written to convert from PLINK to the EIGENSOFT format. This is mostly text manipulation—not ...