Skip to Main Content
Data Algorithms with Spark
book

Data Algorithms with Spark

by Mahmoud Parsian
April 2022
Intermediate to advanced content levelIntermediate to advanced
435 pages
9h 44m
English
O'Reilly Media, Inc.
Book available
Content preview from Data Algorithms with Spark

Chapter 8. Ranking Algorithms

This chapter introduces the following two ranking algorithms and presents their associated implementations in PySpark:

Rank product

This algorithm finds the ranks of items (such as genes) among all items. It was originally developed for the detection of differentially expressed genes in replicated microarray experiments, but has since achieved widespread acceptance and is now used more broadly, including in machine learning. Spark does not provide an API for the rank product, so I will present a custom solution.

PageRank

PageRank is an iterative algorithm for measuring the importance of nodes in a given graph. This algorithm is used heavily by search engines (such as Google) to find the importance of each web page (document) relative to all web pages (a set of documents). In a nutshell, given a set of web pages, the PageRank algorithm calculates a quality ranking for each page. The Spark API offers multiple solutions for the PageRank algorithm. I’ll present one of those, using the GraphFrames API, as well as two custom solutions.

Rank Product

The rank product is an algorithm commonly used in the field of bioinformatics, also known as computational biology. It was originally developed as a biologically motivated test for the detection of differentially expressed genes in replicated micro-array experiments. As well as expression profiling, it can be ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Algorithms

Data Algorithms

Mahmoud Parsian
Algorithms and Data Structures for Massive Datasets

Algorithms and Data Structures for Massive Datasets

Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert

Publisher Resources

ISBN: 9781492082378Errata PageSupplemental Content