Building graphs

Let's now open our Spark shell and build three types of graphs: a directed email communication network, a bipartite graph of ingredient-compound connections, and a multigraph using the previous graph builders.


Unless otherwise stated, we always assume that the Spark shell is launched from the $SPARKHOME directory. It then becomes the current directory for any relative file path used in this book.

Building directed graphs

The first graph that we will build is the Enron email communication network. If you have restarted your Spark shell, you need to again import the GraphX library. First, create a new folder called data inside $SPARKHOME and copy the dataset into it. This file contains the adjacency list of the email communications ...

Get Apache Spark Graph Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.