O'Reilly logo

IBM SPSS Modeler Cookbook by Scott Mutchler, Tom Khabaza, Meta S. Brown, Dean Abbott, Keith McCormick

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Data Preparation – Clean

In this chapter, we will cover:

  • Binning scale variables to address missing data
  • Using a full data model/partial data model approach to address missing data
  • Imputing in-stream mean or median
  • Imputing missing values randomly from uniform or normal distributions
  • Using random imputation to match a variable's distribution
  • Searching for similar records using a Neural Network for inexact matching
  • Using neuro-fuzzy searching to find similar names
  • Producing longer Soundex codes

Introduction

This chapter addresses the clean subtask of the data preparation phase. CRISP-DM describes this subtask in the following way:

Raise the data quality to the level required by the selected analysis techniques. This may involve selection of clean ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required