O'Reilly logo

Apache Spark Graph Processing by Rindra Ramamonjison

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Performance optimization

In addition to the sendMsg and mergeMsg methods, aggregateMessages can also take an optional argument TripletFields, which indicates what data is accessed in EdgeContext. The main reason for explicitly specifying such information is to help optimize the performance of the aggregateMessages operation.

In fact, TripletFields represents a subset of the fields of _EdgeTriplet_ and it enables GraphX to populate only those fields that are necessary.

The default value is TripletFields.All, which means that the sendMsg function may access any of the fields in the EdgeContext class. Otherwise, the TripletFields argument is used to tell GraphX that only part of EdgeContext will be required so that an efficient join strategy can be ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required