Link Prediction with MLlib and XGBoost

Now, when all the data is prepared and put into a suitable shape, we can train a model, which will predict whether two authors are likely to become coauthors or not. For that we will use a binary classifier model, which will be trained to predict what is the probability that this edge exists in a graph.

Apache Spark comes with a library which provides scalable implementation of several Machine Learning algorithms. This library is called MLlib. Let's add it to our pom.xml:

<dependency>  <groupId>org.apache.spark</groupId>  <artifactId>spark-mllib_2.11</artifactId>  <version>2.1.0</version></dependency>

There are a number of models we can use, including logistic regression, random forest, and Gradient Boosted ...

Get Java: Data Science Made Easy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.