Classifying data points with Random Forest model using MLib

In this recipe, we will demonstrate how you can classify data points using Random Forest algorithm with MLib.

Getting ready

  1. You will be using the Maven project you created in the recipe named Solving simple text mining problems with Apache Spark. If you have not done so yet, then follow steps 1-6 in the Getting ready section of that recipe.
  2. Go to https://github.com/apache/spark/blob/master/data/mllib/sample_binary_classification_data.txt, download the data, and save as rf-data.txt in the data folder of your project that you created by following the instruction in step 1. Alternatively, you can create a text file named rf-data.txt in the data folder of your project and copy-paste the data ...

Get Java Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.