Clustering

In this example, we will look at a cluster finding algorithm in Scikit-learn called DBSCAN. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise, and is a clustering algorithm that favors groups of points and can identify points outside any of these groups (clusters) as noise (outliers). As with the linear machine learning methods, Scikit-learn makes it very easy to work with it. We first read in the data from  Chapter 5 Clustering, with Pandas' read_pickle function:

TABLE_FILE = 'data/test.pick' 
mycat = pd.read_pickle(TABLE_FILE) 

As with the previous dataset, to refresh your memory, we plot the data. It contains a slice of the mapped nearby Universe, that is, galaxies with determined positions (direction and ...

Get Python: End-to-end Data Analysis now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.