58 Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data
These two techniques solve the same problems from different views, both can be
used to complement each other and to gain confidence that the segmentation
produced has a valid and optimal result.
4.5.2 Applying the mining techniques
Behavior data is used as active variables to construct the segmentation model
and demographic and additional data are used as only supplementary variables
to describe the segments, which can be identified in segmentation graphical
output by [] brackets around the variable name as shown in Figure 4-2. We
describe some considerations when you apply the clustering technique.
Neural clustering
As we explained earlier, the neural clustering does not require us to specify
things like similarity, but only the number of clusters we want to derive. Neural
clustering requires us to specify the maximum number of clusters. In this
particular case, the number of segments chosen are nine. The reason for
choosing nine is: the objective of behavioral segmentation is to identify the
customer characteristics and it is used as a fundamental of Customer
Relationship Management by the marketing department. Typically, dealing with
more than nine customer segments for the marketing personnel is not easy and
not efficient in terms of deploying the result.
You can specify the maximum number of segments you want to generate.
However, the decision depends on where and how the segmentation result will
be used and should be considered from a marketing perspective. For example, in
fraud detection application, we need many more segments to find abnormal
behaviors in very small segments, which could possibly exist in huge call
transaction data sets. Fraud detection in the telecoms industry will be discussed
in detail in Chapter 7, Can you determine the characteristics of known and
unknown fraudulent behavior? on page 131.
Note: Both algorithms generate the score (ranges from 0 to 1) which
represents how well fit each record is in the assigned cluster, in other words,
cluster membership. In demographic clustering, the record is better fitting in
the cluster if the score is closer to 1, whereas in neural clustering it is better
fitting in the cluster if the score is closer to 0.
Chapter 4. How to discover the characteristics of your customers 59
Usually neural clustering requires the input data as normalized or scaled to a
range of 0.0 to 1.0, In addition, categorical values are converted into a numeric
code for presentation to the neural network. If you have categorical fields which
has many distinct values, your performance of the mining run may be degraded
due to very large neural network.
Demographic clustering
Major difference between two clustering techniques is that the demographic
clustering technique tries to discover the number of clusters automatically, while
the neural clustering needs to specify how many clusters you want to derive. To
determine number of clusters automatically, it requires you to specify how similar
the customers within clusters should be similarity threshold. The threshold
ranges between 0 and 1, where value of 1 means that all customers in a cluster
must be identical, and value of 0 means that customers can be completely
different. If two customers are more similar than the threshold value, then they
are candidates for being put into the same cluster.
Note: When you try to determine the optimum number of clusters, it is
important to keep in mind that the most important criteria is the availability
from business point of view.
Note to experts: The neural mining techniques in IM for Data automatically
normalizes the input data. If you are an expert on neural networks and you
want to do your own normalization, you can suppress this normalization. In
this case all input fields must have values between 0.0 and 1.0.

Get Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.