January 2018
Intermediate to advanced
470 pages
11h 9m
English
I already stated that all the 24 VCF files contribute 820 GB of data. Therefore, I decided to use the genetic variant of chromosome Y only one two make the demonstration clearer. The size is around 160 MB, which is not meant to pose huge computational challenges. You can download all the VCF files as well as the panel file from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/.
Let us get started. We start by creating SparkSession, the gateway for the Spark application:
val spark:SparkSession = SparkSession .builder() .appName("PopStrat") .master("local[*]") .config("spark.sql.warehouse.dir", "C:/Exp/") .getOrCreate()
Then let's show Spark the path of both VCF and the panel file:
val genotypeFile ...
Read now
Unlock full access