216 Enhance Your Business Applications: Simple Integration of Advanced Data Mining Functions
10.4 Specifying mining data
This step specifies the name and the columns of the table that contains your data
training set. This achieved by populating the settings of the table to be used for
IM Modeling in the IDMMX.MININGDATA table.
Table 10-5 lists the methods for specifying the mining data.
Table 10-5 Methods for defining mining data
Example:
call redbook.BuildRuleModel('My_ProductMix_Model’,'CUSTOMER_PRODUCTMIX’,
RuleSettings('CUSTOMER_PRODUCTMIX')
..DM_setMaxNumClus(6)..DM_expClusSettings()
Mining Algorithm Stored Procedure
Method Description Input
DM_defMiningdata Defines table as input for mining Table name
DM_SetColumns Defines the individual column to the
modeling API
Column name
data type
Example:
insert into IDMMX.miningdata
values ('Connection',
IDMMX.DM_MiningData()..DM_defMiningData('CONNECTION_TABLE')..
DM_SetColumns('
<Column name="SUM_DUR" sqlType="DOUBLE"/>
<Column name="NO_CALLS" sqlType="INTEGER"/>
<Column name="REL_DUR" sqlType="DOUBLE"/>
<Column name="SUM_COST" sqlType="DOUBLE"/>
<Column name="MAX_DUR" sqlType="DOUBLE"/>
<Column name="VAR_DUR" sqlType="DOUBLE"/>
<Column name="NO_CLRS" sqlType="INTEGER"/>
<Column name="CALLER_ID" sqlType="CHARACTER"/>
<Column name="PREMIUM_ID" sqlType="CHARACTER"/>'));
Chapter 10. Building the mining models using IM Modeling functions 217
10.4.1 Defining mining settings
This step includes the following substeps:
1. Generate a logical data settings file from the previous step.
2. Add an additional parameter setting for the mining run. Since different mining
algorithms require different setting, this step involves using different UDFs
and methods for different algorithms.
Table 10-6 tabulates the algorithms and the more frequently used methods
associated with them. It is by no means an exhaustive listing of all the methods
for each of the algorithms.
There are also examples for each of the algorithms. These examples illustrate
the setting up of data mining settings using a table driven approach. Again this is
only one way to build the data mining settings.
Table 10-6 UDFs and frequently used methods for defining mining settings
Algorithm Method Description Input Input data
type
All
DM_genDataSpec Generate the logical data
spec for the data table.
None
DM_addNmp Generate name mapping. Name of mapping
to create
varchar
DM_SetFldNmp Set name mapping active. Name of an
existing name
mapping
varchar
DM_setPowerOptions Set the power option
specific to an algorithm.
Power option
specific to an
algorithm
varchar
Association DM_setItemFld Assign the role of item to an
input column.
Name of the input
column
varchar
DM_SetGroup Assign the role of a group to
an input column. Typically
this is the transaction ID or
customer ID.
Name of group varchar
Name of input
column
varchar
DM_SetMaxLen Set the maximum length for
rule.
Length integer
218 Enhance Your Business Applications: Simple Integration of Advanced Data Mining Functions
DM_setMinSupport Set a minimum support
threshold for rules.
Minimum support integer
DM_setMinConf Set a minimum confidence
threshold for rules.
Minimum
confidence
integer
DM_addTax Create a taxonomy. Name of
taxonomy to be
created
varchar
DM_setConType Set a constraint type. 0 for exclusive,
1 for inclusive
integer
DM_addConItem Add a constraint item. Item to be
included
varchar
DM_remConItem Remove a constraint item. Item to be
removed
varchar
DM_SetFldTax Set taxonomy to active. Field name, name
of taxonomy
varchar,
varchar
Name of
taxonomy
varchar
Example:
insert into IDMMX.RuleSettings
select 'Connection_Segmentation',
IDMMX.DM_RuleSettings()..
DM_useRuleDataSpec(MININGDATA..DM_genDataSpec()..
DM_setFldType(‘TRANSACTION_ID,0)..
DM_setFldType(‘ITEM_NO’,0)..
DM_addNMP(‘newName’,’shop_1.transactions’,’SK2_code’,
’product_name’)..
DM_setFldNmp(‘item’,’newName’)
DM_addTax(‘Taxonomy_1’,’shop_1.prod_hierarchy’,’ITEM_NO’,
’prod_group’, cast(NULL as char),1)..
DM_setFldTax(‘ITEM_NO’,’Taxononmy_1))..’
DM_setMinSupport(70)..
DM_setGroup('TRANSACTION_ID’)..
DM_setItemFld(‘ITEM_NO’)..
from IDMMX.MiningData where ID='Market_Basket';
Algorithm Method Description Input Input data
type
Chapter 10. Building the mining models using IM Modeling functions 219
Tree DM_setClasPar Set maximum purity. MaxPur Keyword
Value integer
Set maximum tree depth. MaxDth keyword
Value float
Set a minimum number of
records per internal node.
MinRec keyword
Value Integer
DM_setCostMat Specify a cost matrix for
cost of misclassification.
Refer to
Administration
Guide
Refer to
Administra
tion Guide
DM_setClasTarget Specify the target field. Target field varchar
Example:
insert into IDMMX.ClasSettings
select 'Churn_Classification',
IDMMX.DM_ClasSettings()..
DM_useClasDataSpec(MININGDATA..DM_genDataSpec()..
DM_setClasTarget(‘CHURN_FLAG’))..
DM_setTreeClasPar(‘MaxPur’,95)..
DM_setTreeClasPar(‘MaxDth’, 6)..
DM_setTreeClasPar('MinRec’,5)..
DM_setICostMat(‘CUSTOMER.COSTMAT’,’ACTUAL’,’PREDICTED’, ‘WEIGHT’)..
from IDMMX.MiningData where ID='Customer_Churnt';
Algorithm Method Description Input Input data
type
220 Enhance Your Business Applications: Simple Integration of Advanced Data Mining Functions
Clustering DM_setDClusPar Set the value weighting for
field.
Refer to reference
guide
Refer to
reference
guide
DM_setMaxClus Set the maximum number of
clusters allowed.
Maximum number
of cluster
integer
DM_setFldSimScale Set the field similarity scale. Field name,
similarity scale
varchar,
double
Similarity scale double
DM_setFldOutTreat Set the treatment of
outliers.
Field,
treatment
varchar,
integer
(1,2,3)
Treatment integer:
(1,2,3)
DM_addSimMat Add similarity metrix to
setting.
See reference
guide
refer to
reference
DM_setExecTime Set the maximum execution
time.
Execution time in
minutes
integer
DM_setMinData Set the minimum
percentage of data that the
clustering run must
process.
Percentage double
(0-100)
Algorithm Method Description Input Input data
type

Get Enhance Your Business Applications: Simple Integration of Advanced Data Mining Functions now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.