Every modeling technique requires an evaluation phase. For example, we may work hard to develop a multiple regression model for predicting the amount of money to be spent on a new car. But, if the standard error of the estimate s for this regression model is $100,000, then the usefulness of the regression model is questionable. In the classification realm, we would expect that a model predicting who will respond to our direct-mail marketing operation will yield more profitable results than the baseline “send-a-coupon-to-everybody” or “send-out-no-coupons-at-all” models.
In a similar way, clustering models need to be evaluated as well. Some of the questions of interest might be the following:
In this chapter, we introduce two methods for measuring cluster goodness, the silhouette method, and the pseudo-F statistic. These techniques will help to address these questions by evaluating and measuring the goodness of our cluster solutions. We also examine a method to validate our clusters using cross-validation with graphical and statistical analysis.
Any measure of cluster goodness, or cluster quality, should address the ...