O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

IBM SPSS Modeler Essentials

Book Description

Get to grips with the fundamentals of data mining and predictive analytics with IBM SPSS Modeler

About This Book

  • Get up?and-running with IBM SPSS Modeler without going into too much depth.
  • Identify interesting relationships within your data and build effective data mining and predictive analytics solutions
  • A quick, easy?to-follow guide to give you a fundamental understanding of SPSS Modeler, written by the best in the business

Who This Book Is For

This book is ideal for those who are new to SPSS Modeler and want to start using it as quickly as possible, without going into too much detail. An understanding of basic data mining concepts will be helpful, to get the best out of the book.

What You Will Learn

  • Understand the basics of data mining and familiarize yourself with Modeler's visual programming interface
  • Import data into Modeler and learn how to properly declare metadata
  • Obtain summary statistics and audit the quality of your data
  • Prepare data for modeling by selecting and sorting cases, identifying and removing duplicates, combining data files, and modifying and creating fields
  • Assess simple relationships using various statistical and graphing techniques
  • Get an overview of the different types of models available in Modeler
  • Build a decision tree model and assess its results
  • Score new data and export predictions

In Detail

IBM SPSS Modeler allows users to quickly and efficiently use predictive analytics and gain insights from your data. With almost 25 years of history, Modeler is the most established and comprehensive Data Mining workbench available. Since it is popular in corporate settings, widely available in university settings, and highly compatible with all the latest technologies, it is the perfect way to start your Data Science and Machine Learning journey.

This book takes a detailed, step-by-step approach to introducing data mining using the de facto standard process, CRISP-DM, and Modeler's easy to learn ?visual programming? style. You will learn how to read data into Modeler, assess data quality, prepare your data for modeling, find interesting patterns and relationships within your data, and export your predictions. Using a single case study throughout, this intentionally short and focused book sticks to the essentials. The authors have drawn upon their decades of teaching thousands of new users, to choose those aspects of Modeler that you should learn first, so that you get off to a good start using proven best practices.

This book provides an overview of various popular data modeling techniques and presents a detailed case study of how to use CHAID, a decision tree model. Assessing a model's performance is as important as building it; this book will also show you how to do that. Finally, you will see how you can score new data and export your predictions. By the end of this book, you will have a firm understanding of the basics of data mining and how to effectively use Modeler to build predictive models.

Style and approach

This book empowers users to build practical & accurate predictive models quickly and intuitively. With the support of the advanced analytics users can discover hidden patterns and trends.This will help users to understand the factors that influence them, enabling you to take advantage of business opportunities and mitigate risks.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Introduction to Data Mining and Predictive Analytics
    1. Introduction to data mining
    2. CRISP-DM overview
      1. Business Understanding
      2. Data Understanding
      3. Data Preparation
      4. Modeling
      5. Evaluation
      6. Deployment
        1. Learning more about CRISP-DM
    3. The data mining process (as a case study)
    4. Summary
  3. The Basics of Using IBM SPSS Modeler
    1. Introducing the Modeler graphic user interface
      1. Stream canvas
      2. Palettes
      3. Modeler menus
      4. Toolbar
      5. Manager tabs
      6. Project window
    2. Building streams
      1. Mouse buttons
      2. Adding nodes
      3. Editing nodes
      4. Deleting nodes
      5. Building a stream
      6. Connecting nodes
      7. Deleting connections
    3. Modeler stream rules
    4. Help options
      1. Help menu
      2. Dialog help
    5. Summary
  4. Importing Data into Modeler
    1. Data structure
      1. Var. File source node
      2. Var. File source node File tab
      3. Var. File source node Data tab
      4. Var. File source node Filter tab
      5. Var. File source node Types tab
      6. Var. File source node Annotations tab
      7. Viewing data
      8. Excel source node
      9. Database source node
    2. Levels of measurement and roles
    3. Summary
  5. Data Quality and Exploration
    1. Data Audit node options
      1. Data Audit node results
        1. The Quality tab
      2. Missing data
        1. Ways to address missing data
        2. Defining missing values in the Type node
        3. Imputing missing values with the Data Audit node
    2. Summary
  6. Cleaning and Selecting Data
    1. Selecting cases
      1. Expression Builder
    2. Sorting cases
    3. Identifying and removing duplicate cases
    4. Reclassifying categorical values
    5. Summary
  7. Combining Data Files
    1. Combining data files with the Append node
    2. Removing fields with the Filter node
    3. Combining data files with the Merge node
      1. The Filter tab
      2. The Optimization tab
    4. Summary
  8. Deriving New Fields
    1. Derive – Formula
    2. Derive – Flag
    3. Derive – Nominal
    4. Derive – Conditional
    5. Summary
  9. Looking for Relationships Between Fields
    1. Relationships between categorical fields
      1. Distribution node
      2. Matrix node
    2. Relationships between categorical and continuous fields
      1. Histogram node
      2. Means node
    3. Relationships between continuous fields
      1. Plot node
      2. Statistics node
    4. Summary
  10. Introduction to Modeling Options in IBM SPSS Modeler
    1. Classification
      1. Categorical targets
      2. Numeric targets
      3. The Auto nodes
      4. Data reduction modeling nodes
    2. Association
    3. Segmentation
      1. Choosing between models
    4. Summary
  11. Decision Tree Models
    1. Decision tree theory
    2. CHAID theory
      1. How CHAID processes different types of input variables
      2. Stopping rules
      3. Building a CHAID Model
      4. Partition node
      5. Overfitting
      6. CHAID dialog options
    3. CHAID results
      1. Summary
  12. Model Assessment and Scoring
    1. Contrasting model assessment with the Evaluation phase
      1. Model assessment using the Analysis node
      2. Modifying CHAID settings
      3. Model comparison using the Analysis node
      4. Model assessment and comparison using the Evaluation node
      5. Scoring new data
      6. Exporting predictions
    2. Summary