Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data

Chapter 5. Can you predict the customers who are likely to leave? 91

Different models can be combined in sequence or parallel to obtain better results.

Some example combinations are:

򐂰 Building predictive models on top segmentations. Segmenting your

population first and using segment number as input variable or build models

only on the customers in certain segments.

򐂰 Run multiple predictive models and use the results as votes. Different models

confirming a prediction to purchase a product will give a the result a higher

confidence.

򐂰 Building different models on different parts of your data set.

5.5.2 Applying the mining technique

The performance of the prediction model could depend on which techniques and

which variables are used. Therefore, in this case, several mining techniques were

tried and compared the performance of each model and chose the best model.

The modeling process is shown in Figure 5-4.

Figure 5-4 Modelling process - Applying mining techniques and selecting best model

Selecting right variables

There are usually a large number of variable candidates for the modeling. To

identify the most relevant variables you can use bivariate statistics. This will

basically give you a list of the variables ordered by a statistical measure of how

different the distribution of each variable is compared to the whole population —

called chi-squared statistics. This is covered in 5.4, “Evaluating the data” on

page 86.

RBF

TREE

NEURAL

Gains Chart Best

Model

Input Data

for each

technique

92 Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data

Decision tree

As described in 5.3, “Sourcing and preprocessing the data” on page 81, you

should use the error weighting function or oversampling if there are not many

churners in the data set. Here, we used the customer data set which has a churn

rate of 2%. When the tree algorithm tries to classify the churners, it may classify

the whole churners into staying, which leads to only a 2% error rate as an entire

tree performance, which is not considered bad for the algorithm.

The maximum tree depth can be also set. In this case, it was limited to 10. There

may be an overfitting problem if tree has more leaf nodes. In other words, even

though tree has less error rates, if it has more depth, this tree will not generally

work with other data sets and is harder to interpret.

Pruning is the process of merging some nodes and branches together to improve

the capability of tree in terms of performance and interpretation.

RBF (Radial Basis Function)

In this case, we used variables which are mainly considered in decision tree.

However, due to the characteristic of the neural network, you may use all the

variables available for the initial run. Afterwards, optimum variables can be found.

Given the churn rate is less than 5%, and there is no function like error weighting

in RBF, we used stratified samples (churner rate goes up to 20%). If you use

balanced samples (churn rate is 50% in training set), performance of the model

would be higher than it is.

Note: IM for Data has an error weighting function which prevents the algorithm

from classifying the whole churners into staying. In this case, we gave 10 as

an error weight which means that if an algorithm classifies the whole churners

into staying, then the tree error rate is going to be 20% instead of 2%. You can

adjust the value of the error weight after seeing the tree result.

Note: IM for Data has an auto pruning algorithm inside and stops criteria,

such as node size, tree depth and accuracy and also allows the user to prune

manually. Here, tree was pruned by IM for Data automatically and some

manual pruning was done if branch doesn’t have meaning in a marketing

perspective.

Get Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data by Corinne Baragoin

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly