The covariance matrix

The covariance matrix provides you with an idea of the correlation between all of the different pairs of features. It's usually the first step of dimensionality reduction because it gives you an idea of the number of features that are strongly related (and therefore, the number of features that you can discard) and the ones that are independent. Using the Iris dataset, where each observation has four features, a correlation matrix can be computed easily, and you can understand its results with the help of a simple graphical representation, which can be obtained with the help of the following code:

In: from sklearn import datasets    import numpy as np    iris = datasets.load_iris()    cov_data = np.corrcoef(iris.data.T) print ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.