By intelligently extracting the most important signals from our data and ignoring noise, feature selection algorithms achieve two major outcomes:
- Improved model performance: By removing redundant data, we are less likely to make decisions based on noisy and irrelevant data, and it also allows our models to hone in on the important features, thereby improving model pipeline predictive performance
- Reduced training and predicting time: By fitting pipelines to less data, this generally results in improved model fitting and predicting times, making our pipelines faster overall
In order to gain a realistic understanding of how and why noisy data gets in the way, let's introduce our newest dataset, ...