Implementing k-means using H2O over Spark
In this recipe, we'll look at how to run a k-means clustering algorithm on a dataset of figures concerning prostate cancer. Please download the dataset from https://github.com/ChitturiPadma/datasets/blob/master/prostate.csv. This is prostate cancer data that came from a study that examined the correlation between the level of prostate-specific antigen and a number of other clinical measures in men.
To step through this recipe, you will need a running Spark Cluster in any one of the following modes: Local, standalone, YARN, Mesos. Include the Spark MLlib package in the
build.sbt file so that it downloads the related libraries and the API can be used. Install Hadoop (optionally), Scala, and Java. ...