October 2017
Intermediate to advanced
1159 pages
26h 10m
English
In this case study, we use the CoverType dataset to demonstrate classification and clustering algorithms from H2O, Apache Spark MLlib, and SAMOA Machine Learning libraries in Java.
The CoverType dataset available from the UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Covertype) contains unscaled cartographic data for 581,012 cells of forest land 30 x 30 m2 in dimension, accompanied by actual forest cover type labels. In the experiments conducted here, we use the normalized version of the data. Including one-hot encoding of two categorical types, there are a total of 54 attributes in each row.
First, we treat the problem as one of classification using the labels included ...
Read now
Unlock full access