May 2019
Intermediate to advanced
664 pages
15h 41m
English
So far, we've looked at several feature selection techniques, such as regularization, stepwise, and recursive feature elimination. I now want to introduce an effective feature selection method for classification problems with random forests using the Boruta package. A paper is available that provides details on how it works in providing all the relevant features: Kursa M., Rudnicki W. (2010), Feature Selection with the Boruta Package, Journal of Statistical Software, 36(11), 1 - 13.
What I'll do here is provide an overview of the algorithm and then apply it to the simulated dataset. I've found it to be highly effective at eliminating unimportant features, but be advised it can be computationally intensive. ...