September 2015
Intermediate to advanced
148 pages
3h 20m
English
Let's now open our Spark shell and build three types of graphs: a directed email communication network, a bipartite graph of ingredient-compound connections, and a multigraph using the previous graph builders.
Unless otherwise stated, we always assume that the Spark shell is launched from the $SPARKHOME directory. It then becomes the current directory for any relative file path used in this book.
The first graph that we will build is the Enron email communication network. If you have restarted your Spark shell, you need to again import the GraphX library. First, create a new folder called data inside $SPARKHOME and copy the dataset into it. This file contains the adjacency list of the email communications ...
Read now
Unlock full access