Selection based on feature variance

This method is the simplest approach to feature selection, and it's often used as the baseline. It simply removes all the features which have small variance; typically, lower than the one set. By default, the VarianceThresholder object removes all the zero-variance features, but you can control it with the threshold parameter.

Let's create a small dataset composed of 10 observations and 5 features, 3 of them informative:

In: from sklearn.datasets import make_classification    X, y = make_classification(n_samples=10, n_features=5,     n_informative=3, n_redundant=0, random_state=101)

Now, let's measure their Variance:

In: print ("Variance:", np.var(X, axis=0))Out: Variance: [ 2.50852168 1.47239461 0.80912826 1.51763426 ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.