Training for bisecting K-means in Spark ML involves taking an approach similar to the other models -- we pass a DataFrame that contains our training data to the fit method of the KMeans object. Note that here we use the libsvm data format:
- Instantiate the cluster object:
val spConfig = (new SparkConf).setMaster("local[1]").setAppName("SparkApp"). set("spark.driver.allowMultipleContexts", "true") val spark = SparkSession .builder() .appName("Spark SQL Example") .config(spConfig) .getOrCreate() val datasetUsers = spark.read.format("libsvm").load( BASE + "/movie_lens_2f_users_libsvm/part-00000") datasetUsers.show(3)
+-----+--------------------+ ...