April 2018
Beginner to intermediate
282 pages
6h 52m
English
We can use scikit-learn to perform hierarchical clustering in Python. We need to import the AgglomerativeClustering method from sklearn.cluster for creating the clusters. Hierarchical clustering works on distance measures, so we need to convert categorical data to a suitable numeric format prior to building the model. We have used one-hot encoding to convert a categorical attribute to a numeric format, and there exist various other methods to accomplish this task. This topic will be covered in detail in the next chapter:
import pandas as pdimport numpy as npfrom sklearn import preprocessingfrom sklearn.cluster import AgglomerativeClusteringhr_data = pd.read_csv('data/hr.csv', header=0)hr_data.head()hr_data = hr_data.dropna() ...