Best practice 8 – deciding on whether or not to select features, and if so, how to do so

We have seen in Chapter 7, Predicting Online Ads Click-through with Logistic Regression, where feature selection was performed using L1-based regularized logistic regression and random forest. The benefits of feature selection include the following:

  • Reducing the training time of prediction models, as redundant, or irrelevant features are eliminated
  • Reducing overfitting for the preceding same reason
  • Likely improving performance as prediction models will learn from data with more significant features

Note we used the word likely because there is no absolute certainty that feature selection will increase prediction accuracy. It is therefore good practice ...

Get Python Machine Learning By Example - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.