17 Clustering

Measurements extracted from biological systems may be dependent on a large number of variables in manners that are not yet understood. One method of analyzing such data sets is to group data vectors that are similar. Once a group is collected, it can be further analyzed to find the reasons for the similarity. The process of clustering is often used to create these groups, and the most common of these methods is the k-means clustering algorithm. This chapter will focus on the development and use of the k-means method and some useful extensions.

17.1 The Purpose of Clustering

Given a set of data vectors image, the object is to group the ...

Get Python for Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.