Chapter 2

State of the Art in Clustering and Semi-Supervised Techniques

2.1. Introduction

This chapter provides a survey of the most popular techniques for unsupervised machine learning (clustering) and their applications and introduces the main existing approaches in the field of semi-supervised classification.

2.2. Unsupervised machine learning (clustering)

Exploring large amounts of data to extract meaningful information about its group structure by considering similarities and differences between data entities has been a relevant subject under investigation since the early 1930s.

Each one of the data groups to be identified by the exploratory analysis is referred to as “cluster,” and the framework of related techniques is known under the name “cluster analysis.”

Formally, cluster analysis has been defined as:

“the organisation of a collection of patterns (usually represented as a vector of measurements, or a point in a multi-dimensional space) into clusters based on similarity.” [JAI 99]

2.3. A brief history of cluster analysis

The first studies about cluster analysis were conducted in the field of analytical psychology. There is some controversy in the literature about the attribution of the initial clustering schemes. However, according to D. E. Bailey [TRY 70], it seems to be Robert C. Tryon who originally conceived cluster analysis and its first application to psychological data. Tryon envisaged and formulated algorithms to group a set of intellectual abilities in humans, ...

Get Semi-Supervised and Unsupervised Machine Learning: Novel Strategies now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.