Selection based on feature variance

This method is the simplest approach to feature selection, and it's often used as the baseline. It simply removes all the features which have small variance; typically, lower than the one set. By default, the VarianceThresholder object removes all the zero-variance features, but you can control it with the threshold parameter.

Let's create a small dataset composed of 10 observations and 5 features, 3 of them informative:

In: from sklearn.datasets import make_classification    X, y = make_classification(n_samples=10, n_features=5,     n_informative=3, n_redundant=0, random_state=101)

Now, let's measure their Variance:

In: print ("Variance:", np.var(X, axis=0))Out: Variance: [ 2.50852168 1.47239461 0.80912826 1.51763426 ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.