Chapter 3. A generic data mining method 31
dig, but if the gold could only be extracted by “panning” or some other technique,
you may spend a lot of time throwing away the valuable material in a fruitless
search, for searching for the right thing at the wrong place or with the wrong
technique.
Data mining is about choosing the right tools for the job and then using them
skillfully to discover the information in your data. We have already seen there are
a number of tools that can be used, and that very often we have to use a
combination of the tools at our disposal, if we are to make real discoveries and
extract the value from our data.
The
first step in our data mining method is therefore to identify the business
issue that you want to address and then determine how the business issue can
be translated into a question, or set of questions, that data mining can address.
By
business issue we mean that there is an identified problem to which you need
an answer, where you suspect, or know, that the answer is buried somewhere in
the data, but you are not sure where it is.
A business issue should fulfill the requirements of having:
A clear description of the problem to be addressed
An understanding of the data that might be relevant
A vision for how you are going use the mining results in your business
Describing the problem
If you are not sure what questions data mining can address, then the best
approach is to look at examples of where it has been successfully used, either in
your own industry or in related industries. Many business and research fields
have been proven to be excellent candidates for data mining. The major fraction
are covered by banking, insurance, retail and telecommunications (telecoms),
but there are many others such as manufacturing, pharmaceuticals,
biotechnology and so on, where significant benefits have also been derived.
Well-known approaches are: customer profiling and cross-selling in retail, loan
delinquency and fraud detection in banking and finance, customer retention
(attrition and churn) in telecoms, patient profiling and weight rating for Diagnosis
Related Groups in health care and so on. Some of these are depicted in
Figure 3-5.