This measure doesn't need to know the ground truth and can be used to check, at the same time, the intra-cluster cohesion and the inter-cluster separation. In order to define the Silhouette score, we need to introduce two auxiliary functions. The first one is the average intra-cluster distance of a sample xi belonging to a cluster Cj:
In the previous expression, n(k) is the number of samples assigned to the cluster Cj and d(a, b) is a standard distance function (in the majority of cases, the Euclidean distance is chosen). We need also to define the lowest inter-cluster distance which can be interpreted as the average nearest-cluster ...