Chapter 5. Can you predict the customers who are likely to leave? 81
5.3 Sourcing and preprocessing the data
To create our data model we have to take the raw data that we collect and
convert it into the format required by the data models. We call this stage the
process sourcing and preprocessing, and this is
the third stage in our data
mining method
.
But, before sourcing the data into one integrated table or view or flat file which is
the required format in data mining, the churn prediction needs additional
consideration due to the nature of prediction modeling predict the future based
on the past.
Determine time window
When sourcing all the data defined, it is necessary to specify which time frame of
data is supposed to be gathered.
25 Total_dur Total minutes of call
26 Inbound_dur Duration of outbound calls
27 Discount_share Discount calls (in regards of regular calls)
28 Complet_call 3 month number of call completed
BILLING /
PAYMENT
29 Revenue Revenue
30 Bill_amt Amount of bill
31 Pay_delayed_before How many times the payment was delayed?
DERIVED INDICS
32 Outsphere Number of different telephone number for outbound
call
33 Mobility Number of network cell visited during the call
34 Concentration Call for top 2 most frequently used phone in regards of
total calls
35 Quality Successful calls in regards of failed calls
36 Call_trend N month slope of the minutes of call
Variable name Description
82 Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data
You should define the following three items to decide which time frame of
customer data and churn information are going to be used in the model.
򐂰 Data window: Time frame for input variables that is used for constructing
model
򐂰 Forecasting window: Time frame for the prediction and used when sourcing
the target prediction variable (churn indicator). The churn prediction model is
often referred to as WHO and WHEN model which means that it tries to
answer the questions: who is going to leave the company and when are they
going to leave? The forecasting window is the WHEN part of churn
prediction modeling. In the phase of building model, the forecasting window is
the time frame to examine whether the customers left the company or not.
򐂰 Time lag: Interval between data window and forecasting window.
In this case, we used six months as a data window, two months as a time lag and
one month as a forecasting window, as shown in Figure 5-1.
In the model building phase, six months of historical data from February to July
for customers who are active as of the end of July is used with churn information,
whether or not these customers left the company in October. This model can be
applied to customers who are active as of the end of August to predict probable
churners in November.
Therefore, in early September, marketing personnel can get the customers list of
those who are likely to leave the company in November, and a two month time
frame is available for them to setup and execute the proper marketing actions.
You can decide about a data window after studying historical churn patterns. You
better avoid certain time frames, if there are some abnormal patterns due to
external impacts. Timeframes of the latest data available to build the prediction
model is a good example of the data window.

Get Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.