Reducing dimensions in the data

In order to easily visualize the distribution of some unlabeled data in which the input values have multiple dimensions, we must reduce the number of feature dimensions to two or three. Once we have reduced the number of dimensions of the input data to two or three dimensions, we can trivially plot the data to provide a more understandable visualization of it. This process of reducing the number of dimensions in the input data is known as dimensionality reduction. As this process reduces the total number of dimensions used to represent the sample data, it is also useful for data compression.

Principal Component Analysis (PCA) is a form of dimensionality reduction in which the input variables in the sample data are ...

Get Clojure for Machine Learning now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.