1A Topological Clustering of Variables

The clustering of objects (individuals or variables) is one of the most used approaches to exploring multivariate data. The two most common unsupervised clustering strategies are hierarchical ascending clustering (HAC) and k-means partitioning used to identify groups of similar objects in a dataset to divide it into homogeneous groups.

The proposed topological clustering of variables, called TCV, studies a homogeneous set of variables defined on the same set of individuals, based on the notion of neighborhood graphs, some of these variables being more-or-less correlated or linked according to the type quantitative or qualitative of the variables. This topological data analysis approach can then be useful for dimension reduction and variable selection. It is a topological hierarchical clustering analysis of a set of variables which can be quantitative, qualitative or a mixture of both. It arranges variables into homogeneous groups according to their correlations or associations studied in a topological context of principal component analysis (PCA) or multiple correspondence analysis (MCA). The proposed TCV is adapted to the type of data considered; its principle is presented and illustrated using simple real datasets with quantitative, qualitative and mixed variables. The results of these illustrative examples are compared to those of other variables clustering approaches.

1.1. Introduction

The objective of this chapter is to propose a ...

Get Data Analysis and Related Applications, Volume 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.