
3.4 divide-and-Conquer approaCh 105
Suppose we need to determine if further testing on attribute X is needed.
Then, the chi-square value is calculated using the value of the attribute in the
formula above. If the value is lower than a pre-assigned threshold, say 95%,
we cannot reject that the test on attribute X is irrelevant to the classification.
Further dividing of the datasets is not needed. If the value is higher than the
threshold, the test on the attribute is necessary. If no attribute is found to be
relevant, then the tree should stop growing at the point of the subtree. The
decision tree built this way will avoid over-fitting.
A Problem ...