O'Reilly logo

Data Algorithms by Mahmoud Parsian

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 26. Gene Aggregation

This chapter provides four distinct solutions to gene aggregation (also known as marker frequency in clinical applications), in MapReduce/Hadoop and Spark. The input data for gene aggregation is patients’ biosets. As discussed in previous chapters, a bioset, also called a gene signature, encompasses data in the form of experimental sample comparisons (for transcriptomic, epigenetic, and copy-number variation data), as well as genotype signatures (for GWAS and mutational data). In simple terms, a bioset is a list of key-value pairs, where the key is a geneID and the value is a list of associated attributes. Gene aggregation is used in clinical applications to identify transcriptional signatures and patterns of gene expression data. Gene aggregation is also used to see how genes are grouped together and how this affects the overall analysis. Gene aggregation is an evolutionary method and depends on chromosomal folding and higher-order structures.

Gene aggregation is achieved through three metrics:

  • Reference type refers to the type of patient data:

    • r1 = normal

    • r2 = disease

    • r3 = paired

    • r4 = unknown

  • Gene filter type refers to the type of filter applied to the data. The filter type indicates how gene values will be grouped and analyzed. For example, if a filter type is up, then only gene values that are greater than a filter value threshold will be considered for further analysis. There are three gene filter types:

    • Absolute value (abs)

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required