Clustering

In this example, we will look at a cluster finding algorithm in Scikit-learn called DBSCAN. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise, and is a clustering algorithm that favors groups of points and can identify points outside any of these groups (clusters) as noise (outliers). As with the linear machine learning methods, Scikit-learn makes it very easy to work with it. We first read in the data from Chapter 5 , Clustering, with Pandas' read_pickle function:

TABLE_FILE = 'data/test.pick' 
mycat = pd.read_pickle(TABLE_FILE)

As with the previous dataset, to refresh your memory, we plot the data. It contains a slice of the mapped nearby Universe, that is, galaxies with determined positions (direction and ...

Get Python: End-to-end Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python: End-to-end Data Analysis by Phuong Vothihong, Martin Czygan, Ivan Idris, Magnus Vilhelm Persson, Luiz Felipe Martins

Clustering

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly