July 2017
Intermediate to advanced
796 pages
18h 55m
English
A function call to filter() restricts the vertex set to the set of vertices satisfying the given predicate. This operation preserves the index for efficient joins with the original RDD, and it sets bits in the bitmask rather than allocating new memory:
def filter(pred: Tuple2[VertexId, VD] => Boolean): VertexRDD[VD]
Using filter, we can filter out everything but the vertex for user Mark, which can be done either using the vertexId or the User.name attribute. We can also filter for the User.occupation attribute.
The following is the code to accomplish the same:
scala> graph.vertices.filter(x => x._1 == 2).take(10)res118: Array[(org.apache.spark.graphx.VertexId, User)] = Array((2,User(Mark,Doctor)))scala> graph.vertices.filter(x => ...
Read now
Unlock full access