O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Bisecting KMeans

It is a variation of generic KMeans.

The steps of the algorithm are:

  1. Initialize by randomly selecting a point, say  then compute the centroid w of M and compute:

The centroid is the center of the cluster. A centroid is a vector containing one number for each variable, where each number is the mean of a variable for the observations in that cluster.
  1. Divide M =[x1, x2, ... xn] into two, sub-clusters ML and MR, according to the following rule:

              ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required