May 2018
Beginner to intermediate
384 pages
10h 19m
English
We will proceed with the implementation of K-means with Spark ML as follows:
import org.apache.spark.ml.feature.LabeledPointimport org.apache.spark.ml.linalg.Vectorsimport org.apache.spark.ml.clustering.KMeansval kmeansSampleData = sc.textFile("aibd/k-means-sample.txt")val labeledData = kmeansSampleData.map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).toDouble, parts(2).toDouble))}.cache().toDFval kmeans = new KMeans().setK(2) // Setting the number of clusters.setFeaturesCol("features").setMaxIter(3) // default Max Iteration is 20.setPredictionCol("prediction").setSeed(1L)val model = kmeans.fit(labeledData)summary.predictions.showmodel.clusterCenters.foreach(println) ...Read now
Unlock full access