September 2018
Intermediate to advanced
472 pages
12h 2m
English
This method is the simplest approach to feature selection, and it's often used as the baseline. It simply removes all the features which have small variance; typically, lower than the one set. By default, the VarianceThresholder object removes all the zero-variance features, but you can control it with the threshold parameter.
Let's create a small dataset composed of 10 observations and 5 features, 3 of them informative:
In: from sklearn.datasets import make_classification X, y = make_classification(n_samples=10, n_features=5, n_informative=3, n_redundant=0, random_state=101)
Now, let's measure their Variance:
In: print ("Variance:", np.var(X, axis=0))Out: Variance: [ 2.50852168 1.47239461 0.80912826 1.51763426 ...Read now
Unlock full access