Chapter 4
Explore the Data and Replace
Input Values
About the Tasks That You Will Perform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Generate Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Partition the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Replace Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
About the Tasks That You Will Perform
You have already set up the project and defined the input data source that you will use in
this example. Now, you will import the data and perform the following tasks, which help
you learn properties of the input data and prepare it for subsequent modeling:
1. You will explore the statistical properties of the variables in the input data set. The
results that are generated in this step will give you an idea of which variables are
most useful in predicting the target response (whether a person donates or not) in this
data set.
2. You will partition the data into two data sets, a training data set and a validation data
set. Such partitioning is common practice in data mining and enables you to develop
a complete model that is not overfitted to a particular set of data.
3. You will specify how SAS Enterprise Miner should handle missing values of
predictor variables.
T I P It is always a good idea to plot the input data and to check it for missing values
before you proceed to model building. Knowing the statistical properties of your
input data is essential for building an accurate and robust predictive model.
Generate Descriptive Statistics
To use the StatExplore node to produce a statistical summary of the input data:
1. Select the Explore tab on the Toolbar.
2. Select the StatExplore node icon. Drag the node into the Diagram Workspace.
15

Get Getting Started with SAS Enterprise Miner 14.1 now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.