Clustering is the usual starting point for unsupervised machine learning. This lesson introduces the k-means and hierarchical clustering algorithms, implemented in Python code.
Why is it important?
Whenever you look at a data source, it's likely that the data will somehow form clusters. Datasets with higher dimensions become increasingly more difficult to "eyeball" based on human perception and intuition. These clustering algorithms allow you to discover similarities within data at scale, without first having to label a large training dataset.
What you'll learn—and how you can apply it
Understand how the k-means and hierarchical clustering algorithms work. Create classes in Python to implement these algorithms, and learn how to apply them in example applications. Identify clusters of similar inputs, and find a representative value for each cluster. Prepare to use your own implementations or reuse algorithms implemented in scikit-learn.
This lesson is for you because…
- People interested in data science need to learn how to implement k-means and bottom-up hierarchical clustering algorithms
- Some experience writing code in Python
- Experience working with data in vector or matrix format
Materials or downloads needed in advance
- Download this code, where you'll find this lesson's code in Chapter 19, plus you'll need the linear_algebra functions from Chapter 4.
This lesson is taken from Data Science from Scratch by Joel Grus.
- Title: K-means and hierarchical clustering with Python
- Release date: August 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491966174
You might also like
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …
The Design of Everyday Things
First, businesses discovered quality as a key competitive edge; next came science. Now, Donald A. Norman, …
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …