July 2017
Beginner to intermediate
715 pages
17h 3m
English
Manual inspection of the output is always good, but it can be quite cumbersome. Often there is some extra data, which we can use for evaluating the result of our clustering in a more automatic fashion.
For example, if we use clustering for supervised learning, then we have labels. For example, if we solve the classification problem, then we can use the class information to measure how pure (or homogeneous) the discovered clusters are. That is, we can see what is the ratio of the majority class to the rest of the classes within the cluster.
If we take the complaints dataset, there are some variables, which we did not use for clustering, for example: