36 Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data
Use of common data models
Defining data models for any application is often a complex task and defining
data models for data mining is no exception. Where the data model is required to
support an application that has specific requirements (for example, some form of
business reporting tool) then the data can be defined by asking the end users
what types of information they require and then performing the necessary
aggregations to support this requirement. In the case of data mining, the
challenge is that very often you are not sure at the outset which variables are
important and therefore exactly what is required. Generating data models for
completely new data mining applications can therefore be a time consuming
activity.
The alternative is to use common data models that have been developed to solve
similar business issues to the ones you are trying to address. While these types
of models may not initially provide you with all of the information you require, they
are usually designed to be extendable to include additional variables. The main
advantage of using a common data model is that it will provide you with a way of
quickly seeing how data mining can be used within your business. In the following
chapters we suggest some simple data models that can be used in this way.
3.4.3 Step 3 — Sourcing and preprocessing the data
The third step in the generic data mining method is the sourcing and
preprocessing of the data that populates the data model. Having a defined data
model provides the necessary structure, in terms of the variables that we are
going to mine, but we still have to provide the data.
Data sourcing and preprocessing comprises the stages of
identifying, collecting,
filtering
and aggregating (raw) data into a format required by the data models
and the selected mining function. Since sourcing and preparing the data are the
most time consuming parts of any data mining project, we describe these crucial
steps in broader detail. Where the data is derived from a data warehouse, many
of these stages will already have been performed.
The data sources
The data sources can be different by origin and content as shown in Figure 3-7.