Book description
If you’ve already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.
- Go beyond mere insight and build models than you can deploy in the day to day running of your business
- Save time and effort while getting more value from your data than ever before
- Loaded with detailed step-by-step examples that show you exactly how it’s done by the best in the business
In Detail
IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork.
IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art.
Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace.
Go beyond the basics and get the full power of your data mining workbench with this practical guide.
Table of contents
-
IBM SPSS Modeler Cookbook
- Table of Contents
- IBM SPSS Modeler Cookbook
- Credits
- Foreword
- About the Authors
- About the Reviewers
- www.PacktPub.com
- Preface
-
1. Data Understanding
- Introduction
- Using an empty aggregate to evaluate sample size
- Evaluating the need to sample from the initial data
- Using CHAID stumps when interviewing an SME
- Using a single cluster K-means as an alternative to anomaly detection
- Using an @NULL multiple Derive to explore missing data
- Creating an Outlier report to give to SMEs
- Detecting potential model instability early using the Partition node and Feature Selection node
-
2. Data Preparation – Select
- Introduction
- Using the Feature Selection node creatively to remove or decapitate perfect predictors
- Running a Statistics node on anti-join to evaluate the potential missing data
- Evaluating the use of sampling for speed
- Removing redundant variables using correlation matrices
- Selecting variables using the CHAID Modeling node
- Selecting variables using the Means node
- Selecting variables using single-antecedent Association Rules
-
3. Data Preparation – Clean
- Introduction
- Binning scale variables to address missing data
- Using a full data model/partial data model approach to address missing data
- Imputing in-stream mean or median
- Imputing missing values randomly from uniform or normal distributions
- Using random imputation to match a variable's distribution
- Searching for similar records using a Neural Network for inexact matching
- Using neuro-fuzzy searching to find similar names
- Producing longer Soundex codes
-
4. Data Preparation – Construct
- Introduction
- Building transformations with multiple Derive nodes
- Calculating and comparing conversion rates
- Grouping categorical values
- Transforming high skew and kurtosis variables with a multiple Derive node
- Creating flag variables for aggregation
- Using Association Rules for interaction detection/feature creation
- Creating time-aligned cohorts
-
5. Data Preparation – Integrate and Format
- Introduction
- Speeding up merge with caching and optimization settings
- Merging a lookup table
- Shuffle-down (nonstandard aggregation)
- Cartesian product merge using key-less merge by key
- Multiplying out using Cartesian product merge, user source, and derive dummy
- Changing large numbers of variable names without scripting
- Parsing nonstandard dates
- Parsing and performing a conversion on a complex stream
- Sequence processing
-
6. Selecting and Building a Model
- Introduction
- Evaluating balancing with Auto Classifier
- Building models with and without outliers
- Using Neural Network for Feature Selection
- Creating a bootstrap sample
- Creating bagged logistic regression models
- Using KNN to match similar cases
- Using Auto Classifier to tune models
- Next-Best-Offer for large datasets
-
7. Modeling – Assessment, Evaluation, Deployment, and Monitoring
- Introduction
- How (and why) to validate as well as test
- Using classification trees to explore the predictions of a Neural Network
- Correcting a confusion matrix for an imbalanced target variable by incorporating priors
- Using aggregate to write cluster centers to Excel for conditional formatting
- Creating a classification tree financial summary using aggregate and an Excel Export node
- Reformatting data for reporting with a Transpose node
- Changing formatting of fields in a Table node
- Combining generated filters
-
8. CLEM Scripting
- Introduction
- Building iterative Neural Network forecasts
- Quantifying variable importance with Monte Carlo simulation
- Implementing champion/challenger model management
- Detecting outliers with the jackknife method
- Optimizing K-means cluster solutions
- Automating time series forecasts
- Automating HTML reports and graphs
- Rolling your own modeling algorithm – Weibull analysis
- A. Business Understanding
- Index
Product information
- Title: IBM SPSS Modeler Cookbook
- Author(s):
- Release date: October 2013
- Publisher(s): Packt Publishing
- ISBN: 9781849685467
You might also like
book
Learn PostgreSQL
A comprehensive guide to building, managing, and securing scalable and reliable database and data warehousing applications …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
audiobook
The Design of Everyday Things
First, businesses discovered quality as a key competitive edge; next came science. Now, Donald A. Norman, …
book
Learning Go
Go is rapidly becoming the preferred language for building web services. While there are plenty of …