Chapter 4. How can I characterize my customers from the mix of products that they purchase? 79
A simple visual inspection of Figure 4-8 and Figure 4-9 using the Shopper Type
as a guide seems to show that other than for the cluster number assigned and
the order in which they are arranged in the two output visualizers, the two sets
cluster results are very similar.
However, on closer inspection we will see that this is somewhat superficial and
that the segments produced are not as similar as this simple analysis would
suggest. So what is happening and how do we understand what the two sets of
results really mean? In the next section we explain it all.
4.6 Interpreting the results
In the previous section we looked at the steps we have to follow to get our mining
results using two different clustering techniques. The
sixth stage in our generic
is to interpret the results that we have obtained and determine
how we can map them onto our business. When you are first confronted with the
cluster results the first question that you are going to ask is “What does it all
mean?”. In this section we describe how to understand and read and interpret
the results from the different clustering techniques, but more importantly how you
can compare the results from different cluster techniques.
4.6.1 How to read and interpret the cluster results?
The cluster techniques that we have used both produce results that can be
displayed graphically, as we have seen. We can also obtain additional visual
information by highlighting individual clusters and even individual variables.
There is also another level of detailed information which gives us the statistical
information that we need to fully interpret what the clusters are telling us about
our customers. Although the visualized results are important in giving us an
overall impression of what is happening, the interpretation always needs to be
backed up with the statistical detail to confirm our understanding. In this section
we look at how this is done.
As we discussed in 4.2, “The data to be used” on page 49, the first thing you
need to understand about the visualized results is that the graphs and charts are
telling you about the characteristics of customers who have been put into the
cluster and how these customers differ from the population as a whole. It is not
telling you directly how the customers in one cluster differ from customers in
another cluster. This is an important distinction and we will return to this issue
80 Mining Your Own Business in Retail Using DB2 Intelligent Miner for Data
When we describe our clusters therefore, we are looking at the characteristic
variables of our customers in each cluster, and using these to describe how
customers in this cluster differ from the customers in all other clusters. To do this
we want to choose those variables that make the greatest difference, and
fortunately the order in which the variables are presented to us (using the chi-
square statistic) give us just what we need.
Figure 4-10 shows the expanded visualized result for Cluster 8 from our
demographic cluster results. In expanding the view we have also included some
of the characteristic variables that were not used to produce the cluster, but these
help us to interpret the results.
Figure 4-10 Demographic cluster 8
This cluster is the first of the two clusters that contained what our business rules
defined as Family Shoppers. You will remember that the business rule for this