July 2016
Beginner to intermediate
462 pages
9h 14m
English
In Python Data Analysis, you learned about clustering—separating data into clusters without providing any hints-which is a form of unsupervised learning. Sometimes, we need to take a guess for the number of clusters, as we did in the Clustering streaming data with Spark recipe.
There is no restriction against having clusters contain other clusters. In such a case, we speak of hierarchical clustering. We need a distance metric to separate data points. Take a look at the following equations:

In this recipe, we will use Euclidean distance (9.2), provided by the SciPy pdist() function. The distance between sets of points ...