92 Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data
Decision tree
As described in 5.3, “Sourcing and preprocessing the data” on page 81, you
should use the error weighting function or oversampling if there are not many
churners in the data set. Here, we used the customer data set which has a churn
rate of 2%. When the tree algorithm tries to classify the churners, it may classify
the whole churners into staying, which leads to only a 2% error rate as an entire
tree performance, which is not considered bad for the algorithm.
The maximum tree depth can be also set. In this case, it was limited to 10. There
may be an overfitting problem if tree has more leaf nodes. In other words, even
though tree has less error rates, if it has more depth, this tree will not generally
work with other data sets and is harder to interpret.
Pruning is the process of merging some nodes and branches together to improve
the capability of tree in terms of performance and interpretation.
RBF (Radial Basis Function)
In this case, we used variables which are mainly considered in decision tree.
However, due to the characteristic of the neural network, you may use all the
variables available for the initial run. Afterwards, optimum variables can be found.
Given the churn rate is less than 5%, and there is no function like error weighting
in RBF, we used stratified samples (churner rate goes up to 20%). If you use
balanced samples (churn rate is 50% in training set), performance of the model
would be higher than it is.
Note: IM for Data has an error weighting function which prevents the algorithm
from classifying the whole churners into staying. In this case, we gave 10 as
an error weight which means that if an algorithm classifies the whole churners
into staying, then the tree error rate is going to be 20% instead of 2%. You can
adjust the value of the error weight after seeing the tree result.
Note: IM for Data has an auto pruning algorithm inside and stops criteria,
such as node size, tree depth and accuracy and also allows the user to prune
manually. Here, tree was pruned by IM for Data automatically and some
manual pruning was done if branch doesn’t have meaning in a marketing
perspective.