k-means where k is unknown

So far, we've been able to define, in advance, how many clusters the algorithm should find. In each example, we've started the project knowing that our data has three clusters, so we've manually programmed a value of 3 for k. This is still a very useful algorithm, but you may not always know how many clusters are represented in your data. To solve this problem we need to extend the k-means algorithm.

A major reason I included the optional error calculation in our k-means implementation was to help solve this problem. Using an error metric—in any ML algorithm—doesn't only allow us to search for a solution, it also allows us to search for the best parameters that yield the best solution.

In a way, we need to build ...

Get Hands-on Machine Learning with JavaScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.