June 2017
Beginner to intermediate
576 pages
15h 22m
English
Clustering is a method which groups data into different classes, so that each class is similar to each other. There are various methods that can be used to define similarity. K-Means clustering is probably the most popular method of clustering. This method uses distances measured to assign data observations to the closest class. Clustering is often used in marketing in order to develop different customer segments.
Clustering is an unsupervised algorithm and is subjective. You can specify beforehand how many groups you wish to cluster into. This number is somewhat arbitrary, and if the goal is interpretability, it can yield to different interpretations.
Scatterplots are often used to show data clusters using only two variables (one ...