Bisecting K-means - training a clustering model

Training for bisecting K-means in Spark ML involves taking an approach similar to the other models -- we pass a DataFrame that contains our training data to the fit method of the KMeans object. Note that here we use the libsvm data format:

  1. Instantiate the cluster object:
        val spConfig = (new                                 SparkConf).setMaster("local[1]").setAppName("SparkApp").         set("spark.driver.allowMultipleContexts", "true")         val spark = SparkSession           .builder()           .appName("Spark SQL Example")           .config(spConfig)           .getOrCreate()         val datasetUsers = spark.read.format("libsvm").load(           BASE + "/movie_lens_2f_users_libsvm/part-00000")         datasetUsers.show(3)
The output of the command show(3) is shown here:
 +-----+--------------------+ ...

Get Machine Learning with Spark - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.