11Cluster Analysis

The Milky Way is nothing more than a mass of innumerable stars planted together in clusters.

Galileo Galilei

11.1. Introduction

In the previous two chapters, we examined the use of supervised learning techniques, regression and classification, to create models using large sets of previously known observations. This means that the class labels were already available in the data-sets.

In this chapter, we’ll explore cluster analysis, or “clustering”. This type of analysis includes a set of unsupervised learning techniques for discovering hidden and unlabeled structures in data.

Clustering aims to seek natural groupings in the data, so that the elements of the same group, or cluster, are more similar than different groups.

Given its exploratory nature, cluster analysis is a fascinating subject, and in this chapter, you will learn the key concepts that can help you organize data into meaningful structures. In this chapter, we’ll address different techniques and algorithms for cluster analysis, namely:

  • – learning to search for points of similarity using the k-means algorithm;
  • – using a bottom-up approach to build hierarchical classification trees.

This chapter will therefore examine these two algorithms. As you’ll see, the best way to get a good look at the importance of cluster analysis for sharing economy companies is to apply these algorithms to a use case.

We invite you to learn more, in what follows, about this type of unsupervised learning.

11.2. Cluster ...

Get Sharing Economy and Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.