In this chapter, we are using the well-known Breast Cancer Wisconsin dataset to perform a cluster analysis. Originally, the dataset was proposed in order to train classifiers; however, it can be very helpful for a non-trivial cluster analysis. It contains 569 records made up of 32 attributes (including the diagnosis and an identification number). All the attributes are strictly related to biological and morphological properties of the tumors, but our goal is to validate generic hypotheses considering the ground truth (benign or malignant) and the statistical properties of the dataset. Before moving on, it's important to clarify some points. The dataset is high-dimensional and the clusters are ...
Analysis of the Breast Cancer Wisconsin dataset
Get Hands-On Unsupervised Learning with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.