Skip to Content
Java: Data Science Made Easy
book

Java: Data Science Made Easy

by Richard M. Reese, Jennifer L. Reese, Alexey Grigorev
July 2017
Beginner to intermediate
715 pages
17h 3m
English
Packt Publishing
Content preview from Java: Data Science Made Easy

Link Prediction with MLlib and XGBoost

Now, when all the data is prepared and put into a suitable shape, we can train a model, which will predict whether two authors are likely to become coauthors or not. For that we will use a binary classifier model, which will be trained to predict what is the probability that this edge exists in a graph.

Apache Spark comes with a library which provides scalable implementation of several Machine Learning algorithms. This library is called MLlib. Let's add it to our pom.xml:

<dependency>  <groupId>org.apache.spark</groupId>  <artifactId>spark-mllib_2.11</artifactId>  <version>2.1.0</version></dependency>

There are a number of models we can use, including logistic regression, random forest, and Gradient Boosted ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Java Data Science Cookbook

Java Data Science Cookbook

Rushdi Shams
Java for Data Science

Java for Data Science

Walter Molina, Richard M. Reese, Shilpi Saxena, Jennifer L. Reese

Publisher Resources

ISBN: 9781788475655Supplemental Content