Skip to Content
Pandas for Everyone: Python Data Analysis, First Edition
book

Pandas for Everyone: Python Data Analysis, First Edition

by Daniel Y. Chen
December 2017
Beginner to intermediate
410 pages
12h 45m
English
Addison-Wesley Professional
Content preview from Pandas for Everyone: Python Data Analysis, First Edition

16. Clustering

16.1 Introduction

Machine learning methods can generally be classified into two main categories of models, supervised learning and unsupervised learning. Thus far, we have been working on supervised learning models, since we train our models with a target y or response variable. In other words, in the training data for our models, we know the “correct” answer. Unsupervised models are modeling techniques in which the “correct” answer is unknown. Many of these methods involve clustering, where the two main methods are k-means clustering and hierarchical clustering.

16.2 k-Means

The technique known as k-means works by first selecting how many clusters, k, exist in the data. The algorithm randomly selects k points in the data and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Pandas for Everyone: Python Data Analysis, 2nd Edition

Pandas for Everyone: Python Data Analysis, 2nd Edition

Daniel Y. Chen

Publisher Resources

ISBN: 9780134547046Purchase book