Node features

Let's first concentrate on the features with which we can compute nodes of a graph. For that, we will need a graph library which lets us compute graph features such as degree or Page Rank easily. For Apache Spark, such a library is GraphX. However, at the moment, this library only supports Scala: it uses a lot of Scala-specific features, which makes it very hard (and often impossible) to use it from Java.

However, there is another library called GraphFrames, which tries to combine GraphX with DataFrames. Luckily for us, it supports Java. This package is not available on Maven Central, and to use it, we first need to add the following repository to our pom.xml:

<repository>  <id>bintray-spark</id> <url>https://dl.bintray.com/spark-packages/maven/</url> ...

Get Mastering Java for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.