Quantifying variable importance with Monte Carlo simulation
Finding the smallest subset of all possible input variables that result in an accurate model (that is, a parsimonious solution) is often the biggest challenge for many data mining projects. It's common for data sets to contain 10s to 100s of input variables. Models that are over-trained or simply fail to build are both possible with so called "wide" data sets. Removing unimportant variables to find the sweet spot between model accuracy and stability is where experienced data miners can deliver significant value.
The primary method of variable selection in Modeler is Feature Selection. The Feature Selection process identifies the significance of each variable individually. Statistically ...
Get IBM SPSS Modeler Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.