Apache Spark Solution for Rank Product
Date: This event took place live on August 25 2015
Presented by: Mahmoud Parsian
Duration: Approximately 60 minutes.
Questions? Please send email to
The "rank product" is a statistical technique, used for detecting differentially regulated genes in replicated microarray experiments. The technique has achieved widespread acceptance and is now used more broadly, in such diverse fields as RNAi analysis, proteomics, and machine learning. The "rank product" technique may be used in ranking users (in social networks) and items (such as Amazon.com).
Given large set of genes, users, or items, in this webcast I will present two distinct Spark solutions: (using groupByKey() and combineByKey()) for solving the "rank product".
About Mahmoud Parsian
Mahmoud Parsian, Ph.D. in Computer Science, is a practicing software professional with 30 years of experience as a developer, designer, architect, and author. For the past 15 years, he has been involved in Java server-side, databases, MapReduce, and distributed computing. Dr. Parsian is currently with Illumina and leads the "Big Data" team. He is leading and developing scalable regression algorithms, DNA-Seq, RNA-Seq pipelines using Java, MapReduce/Hadoop/HBase/Spark, and open source tools.