Chapter 5. Can you predict the customers who are likely to leave? 91
Different models can be combined in sequence or parallel to obtain better results.
Some example combinations are:
򐂰 Building predictive models on top segmentations. Segmenting your
population first and using segment number as input variable or build models
only on the customers in certain segments.
򐂰 Run multiple predictive models and use the results as votes. Different models
confirming a prediction to purchase a product will give a the result a higher
confidence.
򐂰 Building different models on different parts of your data set.
5.5.2 Applying the mining technique
The performance of the prediction model could depend on which techniques and
which variables are used. Therefore, in this case, several mining techniques were
tried and compared the performance of each model and chose the best model.
The modeling process is shown in Figure 5-4.
Figure 5-4 Modelling process - Applying mining techniques and selecting best model
Selecting right variables
There are usually a large number of variable candidates for the modeling. To
identify the most relevant variables you can use bivariate statistics. This will
basically give you a list of the variables ordered by a statistical measure of how
different the distribution of each variable is compared to the whole population
called chi-squared statistics. This is covered in 5.4, Evaluating the data on
page 86.
RBF
TREE
NEURAL
Gains Chart Best
Model
Input Data
for each
technique
92 Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data
Decision tree
As described in 5.3, Sourcing and preprocessing the data on page 81, you
should use the error weighting function or oversampling if there are not many
churners in the data set. Here, we used the customer data set which has a churn
rate of 2%. When the tree algorithm tries to classify the churners, it may classify
the whole churners into staying, which leads to only a 2% error rate as an entire
tree performance, which is not considered bad for the algorithm.
The maximum tree depth can be also set. In this case, it was limited to 10. There
may be an overfitting problem if tree has more leaf nodes. In other words, even
though tree has less error rates, if it has more depth, this tree will not generally
work with other data sets and is harder to interpret.
Pruning is the process of merging some nodes and branches together to improve
the capability of tree in terms of performance and interpretation.
RBF (Radial Basis Function)
In this case, we used variables which are mainly considered in decision tree.
However, due to the characteristic of the neural network, you may use all the
variables available for the initial run. Afterwards, optimum variables can be found.
Given the churn rate is less than 5%, and there is no function like error weighting
in RBF, we used stratified samples (churner rate goes up to 20%). If you use
balanced samples (churn rate is 50% in training set), performance of the model
would be higher than it is.
Note: IM for Data has an error weighting function which prevents the algorithm
from classifying the whole churners into staying. In this case, we gave 10 as
an error weight which means that if an algorithm classifies the whole churners
into staying, then the tree error rate is going to be 20% instead of 2%. You can
adjust the value of the error weight after seeing the tree result.
Note: IM for Data has an auto pruning algorithm inside and stops criteria,
such as node size, tree depth and accuracy and also allows the user to prune
manually. Here, tree was pruned by IM for Data automatically and some
manual pruning was done if branch doesnt have meaning in a marketing
perspective.

Get Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.