Graph representation in GraphX

Recall that a property graph is, for us, a directed multigraph with loops that have custom data objects for both vertices and edges. The central entry point of GraphX is the Graph API, which has the following signature:

class Graph[VD, ED] {  val vertices: VertexRDD[VD]  val edges: EdgeRDD[ED]}

So, internally, a graph in GraphX is represented by one RDD encoding for vertices and one for edges. Here, VD is the vertex data type, and ED is the edge data type of our property graph. We will discuss both VertexRDD and EdgeRDD in more detail, as they are so essential for what follows.

In Spark GraphX, vertices have unique identifiers of the Long type, which are called VertexId. A VertexRDD[VD] is, in fact, just an extension ...

Get Mastering Machine Learning with Spark 2.x now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.