Chapter 1. Data Understanding

In this chapter, we will cover:

  • Using an empty aggregate to evaluate sample size
  • Evaluating the need to sample from the initial data
  • Using CHAID stumps when interviewing an SME
  • Using a single cluster K-means as an alternative to anomaly detection
  • Using an @NULL multiple Derive to explore missing data
  • Creating an Outliers report to give to SMEs
  • Detecting potential model instability early using the Partition node and Feature Selection node

Introduction

This opening chapter is regarding data understanding, but this phase is not the first phase of CRISP-DM. Business understanding is a critical phase. Some would argue, including the authors of this book, that business understanding is the phase in most need of more attention by ...

Get IBM SPSS Modeler Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.