Skip to Main Content
Data Algorithms with Spark
book

Data Algorithms with Spark

by Mahmoud Parsian
April 2022
Intermediate to advanced content levelIntermediate to advanced
435 pages
9h 44m
English
O'Reilly Media, Inc.
Book available
Content preview from Data Algorithms with Spark

Chapter 6. Graph Algorithms

So far we’ve mainly been focusing on record data, which is typically stored in flat files or relational databases and can be represented as a matrix (a set of rows with named columns). Now we’ll turn our attention to graph-based data, which depicts the relationships between two or more data points. A common example is social network data: for example, if “Alex” is a “friend” of “Jane” and “Jane” is a “friend” of “Bob,” these relationships form a graph. Airline/flight data is another common example of graph data; we’ll explore both of these (and others) in this chapter.

Data structures are specific ways of organizing and storing data in computers so that it can be used effectively. In addition to linear data structures like the ones we’ve primarily been working with in the previous chapters (arrays, lists, tuples, etc.), these include nonlinear structures such as trees, hash maps, and graphs.

This chapter introduces GraphFrames, a powerful external package for Spark that provides APIs for representing directed and undirected graphs, querying and analyzing graphs, and running algorithms on graphs. We’ll start by exploring graphs and what they are used for, then look at how to use the GraphFrames API in PySpark to build and query graphs. We’ll dig into a few of the algorithms GraphFrames supports, such as finding triangles and motif finding, then walk through some practical, real-world applications.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Algorithms

Data Algorithms

Mahmoud Parsian
Algorithms and Data Structures for Massive Datasets

Algorithms and Data Structures for Massive Datasets

Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert

Publisher Resources

ISBN: 9781492082378Errata PageSupplemental Content