Recall from Figure 14: Classification tree for the test part of the German credit data problem that the rules numbered
16, respectively, covered only
22. If we look at the total number of observations, we have about 600, and individually these rules do not cover even about five percent of them. This is one reason to suspect that maybe we overfitted the data. Using the option of
minsplit, we can restrict the minimum number of observations each rule should cover at the least.
Another technical way of reducing the complexity of a classification tree is by "pruning" the tree. Here, the least important splits are recursively snipped ...