Evaluating balancing with Auto Classifier

Two traps to avoid in data mining are that one should always balance, or that there is only one way to balance. Like most questions asked during a data mining project, the question of whether to balance or not should be answered empirically. The purpose of this recipe is to show how three common kinds of balancing can be compared easily using the Auto Classifier node. This is not to suggest that the resulting models are final models. Rather, this is an early test that can be conducted to evaluate whether or not to balance. One of the kinds of balancing suggested here is to not balance at all. Another suggestion is to double the numbers in a fully reduced balance node.

Getting ready

We will start with the ...

Get IBM SPSS Modeler Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.