Mahmoud Parsian

Apache Spark Solution for Rank Product

Date: This event took place live on August 25 2015

Presented by: Mahmoud Parsian

Duration: Approximately 60 minutes.

Questions? Please send email to

Description:

Watch the webcast recording

The "rank product" is a statistical technique, used for detecting differentially regulated genes in replicated microarray experiments. The technique has achieved widespread acceptance and is now used more broadly, in such diverse fields as RNAi analysis, proteomics, and machine learning. The "rank product" technique may be used in ranking users (in social networks) and items (such as Amazon.com).

Given large set of genes, users, or items, in this webcast I will present two distinct Spark solutions: (using groupByKey() and combineByKey()) for solving the "rank product".

About Mahmoud Parsian

Mahmoud Parsian, Ph.D. in Computer Science, is a practicing software professional with 30 years of experience as a developer, designer, architect, and author. For the past 15 years, he has been involved in Java server-side, databases, MapReduce, and distributed computing. Dr. Parsian is currently with Illumina and leads the "Big Data" team. He is leading and developing scalable regression algorithms, DNA-Seq, RNA-Seq pipelines using Java, MapReduce/Hadoop/HBase/Spark, and open source tools.


You might also be interested in

Data Algorithms
By Mahmoud Parsian
July 2015
$59.99 USD