April 2016
Beginner to intermediate
384 pages
8h 36m
English
Often, you will not know how many clusters you can expect in your data. For two or three-dimensional data, you could plot the dataset in an attempt to eyeball the clusters. However, it becomes harder with a dataset that has many dimensions as, beyond three dimensions, it is impossible to plot the data on one chart.
In this recipe, we will show you how to find the optimal number of clusters for a k-means clustering model. We will be using the Davis-Bouldin metric to assess the performance of our k-means models when we vary the number of clusters. The aim is to stop when a minimum of the metric is found.
In order to execute this, you will need pandas, NumPy, and Scikit. No other prerequisites ...
Read now
Unlock full access