O'Reilly logo

Mastering Machine Learning with Spark 2.x by Michal Malohlava, Max Pumperla, Alex Tellez

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Graph representation in GraphX

Recall that a property graph is, for us, a directed multigraph with loops that have custom data objects for both vertices and edges. The central entry point of GraphX is the Graph API, which has the following signature:

class Graph[VD, ED] {  val vertices: VertexRDD[VD]  val edges: EdgeRDD[ED]}

So, internally, a graph in GraphX is represented by one RDD encoding for vertices and one for edges. Here, VD is the vertex data type, and ED is the edge data type of our property graph. We will discuss both VertexRDD and EdgeRDD in more detail, as they are so essential for what follows.

In Spark GraphX, vertices have unique identifiers of the Long type, which are called VertexId. A VertexRDD[VD] is, in fact, just an extension ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required