Chapter 2. Data Preparation – Select

In this chapter, we will cover:

  • Using the Feature Selection node creatively to remove or decapitate perfect predictors
  • Running a Statistics node on an anti-join to evaluate the potential missing data
  • Evaluating the use of sampling for speed
  • Removing redundant variables using correlation matrices
  • Selecting variables using the CHAID Modeling node
  • Selecting variables using the Means node
  • Selecting variables using single-antecedent Association Rules

Introduction

This chapter focuses on just the first task, Select, of the data preparation phase:

Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical constraints such as limits on data volume or data types. Note ...

Get IBM SPSS Modeler Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.