Chapter 7. The value of DB2 Intelligent Miner For Data 175
7.2.2 Statistical functions
Although the data mining tools are designed to discover information from the
data, understanding the data structure in terms of outlying values or highly
correlated features is often necessary if the full power of the mining techniques is
to be realized. Therefore after transforming the data, the next stage is usually to
analyze it. IM for Data provides a range of statistical functions to facilitate the
analysis and the preparation of data, as well as providing forecasting capabilities.
For example, you can apply statistical functions like regression to understand
hidden relationships in the data, or use factor analysis to reduce the number of
input variables. The statistical functions included are:
Factor analysis: Discovers the relationships among many variables in terms of
a few underlying, but unobservable, quantities called factors.
Linear regression: Used to determine the best linear relationship between the
dependent variable and one or more independent variables.
Polynomial regression: Used to determine the best polynomial relationship
between the dependent variable and one or more independent variables.
Principal component analysis: Used to rotate a coordinate system so that the
axes better match the data distribution. The data can be now described with
fewer dimensions (axes) than before.
Univariate curve fitting: Finds a mathematical function that closely describes
the distribution of your data.
Univariate and bivariate statistics: Descriptive statistics, especially means,
variances, medians, quantiles, and so on.
7.2.3 Mining functions
All of the mining functions can be customized using two levels of expertise. Users
who are not experts can accept the defaults and suppress advanced settings.
However, experienced users who want to fine tune their application are provided
with the capability to customize all settings according to their requirements. It is
also possible to define the mode in which the data mining model will be
performed. The possible modes are:
Training mode: In which a mining function builds a model based on the
selected input data.
Test mode: In which a mining function uses new data with known results to
verify that the model created in training mode produces adequate results.
Application mode: In which a mining function uses a model created in training
mode to predict the specified field for every record in the new input data.