April 2018
Beginner to intermediate
282 pages
6h 52m
English
Features without much variance or variability in the data do not provide any information to an ML model for learning the patterns. For example, a feature with only 5 as a value for every record in a dataset is a constant and is an unimportant feature to be used. Removing this feature is essential.
We can use the VarianceThreshold method from scikit-learn's featureselection package to remove all features whose variance doesn't meet certain criteria or threshold. The sklearn.feature_selection module implements feature selection algorithms. It currently includes univariate filter selection methods and the recursive feature elimination algorithm. The following is an example to illustrate this method:
%matplotlib ...