Case study

In this case study, we use the CoverType dataset to demonstrate classification and clustering algorithms from H2O, Apache Spark MLlib, and SAMOA Machine Learning libraries in Java.

Business problem

The CoverType dataset available from the UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Covertype) contains unscaled cartographic data for 581,012 cells of forest land 30 x 30 m2 in dimension, accompanied by actual forest cover type labels. In the experiments conducted here, we use the normalized version of the data. Including one-hot encoding of two categorical types, there are a total of 54 attributes in each row.

Machine Learning mapping

First, we treat the problem as one of classification using the labels included ...

Get Machine Learning: End-to-End guide for Java developers now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.