November 2018
Intermediate to advanced
360 pages
9h 36m
English
The preparation of data requires some work. First, we must provide the final phased haplotypes for our dataset in the notebook directory so that you can skip the following process, save for the trivial download of integrated_call_samples.20101123.ped. The directory from the repository is Chapter10. The Notebook is Germline.ipynb, the data file is good.match.gz, and the code support files are merge.py and clean_sample.py.
As an example, we will use a phased dataset (chromosome 21) from the 1000 Genomes Project. You can download it with the following command:
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/shapeit2_phased_haplotypes/ALL.chr21.SHAPEIT2_integrated_phase1_v3.20101123.snps_indels_svs.genotypes.all.vcf.gz ...