October 2013
Beginner to intermediate
382 pages
9h 55m
English
Finding the smallest subset of all possible input variables that result in an accurate model (that is, a parsimonious solution) is often the biggest challenge for many data mining projects. It's common for data sets to contain 10s to 100s of input variables. Models that are over-trained or simply fail to build are both possible with so called "wide" data sets. Removing unimportant variables to find the sweet spot between model accuracy and stability is where experienced data miners can deliver significant value.
The primary method of variable selection in Modeler is Feature Selection. The Feature Selection process identifies the significance of each variable individually. Statistically ...
Read now
Unlock full access