9A Novel Clustering Method with Automatic Weighting of Tables and Variables
9.1. Introduction
Clustering analysis is one of the most important methods for accomplishing unsupervised learning from data, which is widely used in areas such as pattern recognition, bioinformatics, data mining, image processing, among others. Clustering aims to provide homogeneous clusters such that the similarity between the objects within the same group is high, and the similarity between objects belonging to different groups is low [JAI 10]. The main objective of the data clustering is to classify a set of objects in groups, minimizing an objective criterion W that measures the homogeneity of the partition of the objects.
Problems involving the classification of complex data appear every day, so new clustering models need to consider different perspectives, views, or tables, to cope these problems. Nowadays, it is more common to note the existence of separate tables that describe objects from different perspectives or views. This approach can be used to solve clustering problems in many application fields. The perspective or view is described by a set of variables, the table represents the values of the set of variables defined by the user on an observed sample.
For example, in a multi-source approach, different representations of several sensors or signatures (Fourier and Karhunen coefficients) can be proposed to describe the same observation. Each of these datasets can be considered as a separate ...
Get Advances in Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.