February 2019
Intermediate to advanced
386 pages
9h 54m
English
A very simple and powerful tool that can show the performance of a clustering algorithm when the ground truth is known is the contingency matrix Cm. If there are m classes, Cm ∈ ℜm × m and each element Cm(i, j) represents the number of samples with Ytrue = i that have been assigned to the cluster j. Hence, a perfect contingency matrix is diagonal, while the presence of elements in all the other cells indicates a clustering error.
In our case, we obtain the following:
from sklearn.metrics.cluster import contingency_matrixcm = contingency_matrix(kmdff['diagnosis'].apply(lambda x: 0 if x == 'B' else 1), kmdff['prediction'])
The output of the previous snippet can be visualized as a heat map (the variable cm is a (2 × 2) matrix): ...