Fundamentals of Predictive Analytics with JMP, Second Edition

Book description

Written for students in undergraduate and graduate statistics courses, as well as for the practitioner who wants to make better decisions from data and models, this updated and expanded second edition of Fundamentals of Predictive Analytics with JMP(R) bridges the gap between courses on basic statistics, which focus on univariate and bivariate analysis, and courses on data mining and predictive analytics. Going beyond the theoretical foundation, this book gives you the technical knowledge and problem-solving skills that you need to perform real-world multivariate data analysis.

First, this book teaches you to recognize when it is appropriate to use a tool, what variables and data are required, and what the results might be. Second, it teaches you how to interpret the results and then, step-by-step, how and where to perform and evaluate the analysis in JMP .

Using JMP 13 and JMP 13 Pro, this book offers the following new and enhanced features in an example-driven format:

  • an add-in for Microsoft Excel
  • Graph Builder
  • dirty data
  • visualization
  • regression
  • logistic regression
  • principal component analysis
  • elastic net
  • cluster analysis
  • decision trees
  • k-nearest neighbors
  • neural networks
  • bootstrap forests
  • boosted trees
  • text mining
  • association rules
  • model comparison

With today’s emphasis on business intelligence, business analytics, and predictive analytics, this second edition is invaluable to anyone who needs to expand his or her knowledge of statistics and to apply real-world, problem-solving analysis.

This book is part of the SAS Press program.

Table of contents

  1. About This Book
  2. About These Authors
  3. Acknowledgments
  4. Chapter 1: Introduction
  5. Historical Perspective
  6. Two Questions Organizations Need to Ask
    1. Return on Investment
    2. Cultural Change
  7. Business Intelligence and Business Analytics
  8. Introductory Statistics Courses
    1. The Problem of Dirty Data
    2. Added Complexities in Multivariate Analysis
  9. Practical Statistical Study
    1. Obtaining and Cleaning the Data
    2. Understanding the Statistical Study as a Story
    3. The Plan-Perform-Analyze-Reflect Cycle
    4. Using Powerful Software
  10. Framework and Chapter Sequence
  11. Chapter 2: Statistics Review
  12. Introduction
  13. Fundamental Concepts 1 and 2
    1. FC1: Always Take a Random and Representative Sample
    2. FC2: Remember That Statistics Is Not an Exact Science
  14. Fundamental Concept 3: Understand a Z-Score
  15. Fundamental Concept 4
    1. FC4: Understand the Central Limit Theorem
    2. Learn from an Example
  16. Fundamental Concept 5
    1. Understand One-Sample Hypothesis Testing
    2. Consider p-Values
  17. Fundamental Concept 6:
    1. Understand That Few Approaches/Techniques Are Correct—Many Are Wrong
    2. Three Possible Outcomes When You Choose a Technique
  18. Chapter 3: Dirty Data
  19. Introduction
  20. Data Set
  21. Error Detection
  22. Outlier Detection
    1. Approach 1
    2. Approach 2
    3. Missing Values
    4. Statistical Assumptions of Patterns of Missing
    5. Conventional Correction Methods
    6. The JMP Approach
    7. Example Using JMP
  23. General First Steps on Receipt of a Data Set
  24. Exercises
  25. Chapter 4: Data Discovery with Multivariate Data
  26. Introduction
  27. Use Tables to Explore Multivariate Data
    1. PivotTables
    2. Tabulate in JMP
  28. Use Graphs to Explore Multivariate Data
    1. Graph Builder
    2. Scatterplot
  29. Explore a Larger Data Set
    1. Trellis Chart
    2. Bubble Plot
  30. Explore a Real-World Data Set
    1. Use Graph Builder to Examine Results of Analyses
    2. Generate a Trellis Chart and Examine Results
    3. Use Dynamic Linking to Explore Comparisons in a Small Data Subset
    4. Return to Graph Builder to Sort and Visualize a Larger Data Set
  31. Chapter 5: Regression and ANOVA
  32. Introduction
  33. Regression
    1. Perform a Simple Regression and Examine Results
    2. Understand and Perform Multiple Regression
    3. Understand and Perform Regression with Categorical Data
  34. Analysis of Variance
    1. Perform a One-Way ANOVA
    2. Evaluate the Model
    3. Perform a Two-Way ANOVA
  35. Exercises
  36. Chapter 6: Logistic Regression
  37. Introduction
    1. Dependence Technique
    2. The Linear Probability Model
    3. The Logistic Function
  38. A Straightforward Example Using JMP
    1. Create a Dummy Variable
    2. Use a Contingency Table to Determine the Odds Ratio
    3. Calculate the Odds Ratio
  39. A Realistic Logistic Regression Statistical Study
    1. Understand the Model-Building Approach
    2. Run Bivariate Analyses
    3. Run the Initial Regression and Examine the Results
    4. Convert a Continuous Variable to Discrete Variables
    5. Produce Interaction Variables
    6. Validate and Use the Model
  40. Exercises
  41. Chapter 7: Principal Components Analysis
  42. Introduction
  43. Basic Steps in JMP
    1. Produce the Correlations and Scatterplot Matrix
    2. Create the Principal Components
    3. Run a Regression of y on Prin1 and Excluding Prin2
    4. Understand Eigenvalue Analysis
    5. Conduct the Eigenvalue Analysis and the Bartlett Test
    6. Verify Lack of Correlation
  44. Dimension Reduction
    1. Produce the Correlations and Scatterplot Matrix
    2. Conduct the Principal Component Analysis
    3. Determine the Number of Principal Components to Select
    4. Compare Methods for Determining the Number of Components
  45. Discovery of Structure in the Data
    1. A Straightforward Example
    2. An Example with Less Well Defined Data
  46. Exercises
  47. Chapter 8: Least Absolute Shrinkage and Selection Operator and Elastic Net
  48. Introduction
    1. The Importance of the Bias-Variance Tradeoff
    2. Ridge Regression
  49. Least Absolute Shrinkage and Selection Operator
    1. Perform the Technique
    2. Examine the Results
    3. Refine the Results
  50. Elastic Net
    1. Perform the Technique
    2. Examine the Results
    3. Compare with LASSO
  51. Exercises
  52. Chapter 9: Cluster Analysis
  53. Introduction
    1. Example Applications
    2. An Example from the Credit Card Industry
    3. The Need to Understand Statistics and the Business Problem
  54. Hierarchical Clustering
    1. Understand the Dendrogram
    2. Understand the Methods for Calculating Distance between Clusters
    3. Perform a Hierarchal Clustering with Complete Linkage
    4. Examine the Results
    5. Consider a Scree Plot to Discern the Best Number of Clusters
    6. Apply the Principles to a Small but Rich Data Set
    7. Consider Adding Clusters in a Regression Analysis
  55. K-Means Clustering
    1. Understand the Benefits and Drawbacks of the Method
    2. Choose k and Determine the Clusters
    3. Perform k-Means Clustering
    4. Change the Number of Clusters
    5. Create a Profile of the Clusters with Parallel Coordinate Plots
    6. Perform Iterative Clustering
    7. Score New Observations
  56. K-Means Clustering versus Hierarchical Clustering
  57. Exercises
  58. Chapter 10: Decision Trees
  59. Introduction
    1. Benefits and Drawbacks
    2. Definitions and an Example
    3. Theoretical Questions
  60. Classification Trees
    1. Begin Tree and Observe Results
    2. Use JMP to Choose the Split That Maximizes the LogWorth Statistic
    3. Split the Root Node According to Rank of Variables
    4. Split Second Node According to the College Variable
    5. Examine Results and Predict the Variable for a Third Split
    6. Examine Results and Predict the Variable for a Fourth Split
    7. Examine Results and Continue Splitting to Gain Actionable Insights
    8. Prune to Simplify Overgrown Trees
    9. Examine Receiver Operator Characteristic and Lift Curves
  61. Regression Trees
    1. Understand How Regression Trees Work
    2. Restart a Regression Driven by Practical Questions
    3. Use Column Contributions and Leaf Reports for Large Data Sets
  62. Exercises
  63. Chapter 11: k-Nearest Neighbors
  64. Introduction
    1. Example—Age and Income as Correlates of Purchase
    2. The Way That JMP Resolves Ties
    3. The Need to Standardize Units of Measurement
  65. k-Nearest Neighbors Analysis
    1. Perform the Analysis
    2. Make Predictions for New Data
  66. k-Nearest Neighbor for Multiclass Problems
    1. Understand the Variables
    2. Perform the Analysis and Examine Results
  67. The k-Nearest Neighbor Regression Models
    1. Perform a Linear Regression as a Basis for Comparison
    2. Apply the k-Nearest Neighbors Technique
    3. Compare the Two Methods
    4. Make Predictions for New Data
  68. Limitations and Drawbacks of the Technique
  69. Exercises
  70. Chapter 12: Neural Networks
  71. Introduction
    1. Drawbacks and Benefits
    2. A Simplified Representation
    3. A More Realistic Representation
  72. Understand Validation Methods
    1. Holdback Validation
    2. k-fold Cross-Validation
  73. Understand the Hidden Layer Structure
    1. A Few Guidelines for Determining Number of Nodes
    2. Practical Strategies for Determining Number of Nodes
    3. The Method of Boosting
  74. Understand Options for Improving the Fit of a Model
  75. Complete the Data Preparation
  76. Use JMP on an Example Data Set
    1. Perform a Linear Regression as a Baseline
    2. Perform the Neural Network Ten Times to Assess Default Performance
    3. Boost the Default Model
    4. Compare Transformation of Variables and Methods of Validation
  77. Exercises
  78. Chapter 13: Bootstrap Forests and Boosted Trees
  79. Introduction
  80. Bootstrap Forests
    1. Understand Bagged Trees
    2. Perform a Bootstrap Forest
    3. Perform a Bootstrap Forest for Regression Trees
  81. Boosted Trees
    1. Understand Boosting
    2. Perform Boosting
    3. Perform a Boosted Tree for Regression Trees
    4. Use Validation and Training Samples
  82. Exercises
  83. Chapter 14: Model Comparison
  84. Introduction
  85. Perform a Model Comparison with Continuous Dependent Variable
    1. Understand Absolute Measures
    2. Understand Relative Measures
    3. Understand Correlation between Variable and Prediction
    4. Explore the Uses of the Different Measures
  86. Perform a Model Comparison with Binary Dependent Variable
    1. Understand the Confusion Matrix and Its Limitations
    2. Understand True Positive Rate and False Positive Rate
    3. Interpret Receiving Operator Characteristic Curves
    4. Compare Two Example Models Predicting Churn
  87. Perform a Model Comparison Using the Lift Chart
  88. Train, Validate, and Test
    1. Perform Stepwise Regression
    2. Examine the Results of Stepwise Regression
    3. Compute the MSE, MAE, and Correlation
    4. Examine the Results for MSE, MAE, and Correlation
    5. Understand Overfitting from a Coin-Flip Example
    6. Use the Model Comparison Platform
  89. Exercises
  90. Chapter 15: Text Mining
  91. Introduction
    1. Historical Perspective
    2. Unstructured Data
  92. Developing the Document Term Matrix
    1. Understand the Tokenizing Stage
    2. Understand the Phrasing Stage
    3. Understand the Terming Stage
    4. Observe the Order of Operations
  93. Developing the Document Term Matrix with a Larger Data Set
    1. Generate a Word Cloud and Examine the Text
    2. Examine and Group Terms
    3. Add Frequent Phrases to List of Terms
    4. Parse the List of Terms
  94. Using Multivariate Techniques
    1. Perform Latent Semantic Analysis
    2. Perform Topic Analysis
    3. Perform Cluster Analysis
  95. Using Predictive Techniques
    1. Perform Primary Analysis
    2. Perform Logistic Regressions
  96. Exercises
  97. Chapter 16: Market Basket Analysis
  98. Introduction
    1. Association Analyses
    2. Examples
  99. Understand Support, Confidence, and Lift
    1. Association Rules
    2. Support
    3. Confidence
    4. Lift
  100. Use JMP to Calculate Confidence and Lift
    1. Use the A Priori Algorithm for More Complex Data Sets
    2. Form Rules and Calculate Confidence and Lift
  101. Analyze a Real Data Set
    1. Perform Association Analysis with Default Settings
    2. Reduce the Number of Rules and Sort Them
    3. Examine Results
    4. Target Results to Take Business Actions
  102. Exercises
  103. Chapter 17: Statistical Storytelling
  104. The Path from Multivariate Data to the Modeling Process
    1. Early Applications of Data Mining
    2. Numerous JMP Customer Stories of Modern Applications
  105. Definitions of Data Mining
    1. Data Mining
    2. Predictive Analytics
  106. A Framework for Predictive Analytics Techniques
  107. The Goal, Tasks, and Phases of Predictive Analytics
    1. The Difference between Statistics and Data Mining
    2. SEMMA
  108. References
  109. Index

Product information

  • Title: Fundamentals of Predictive Analytics with JMP, Second Edition
  • Author(s): Ron Klimberg, B. D. McCullough
  • Release date: December 2017
  • Publisher(s): SAS Institute
  • ISBN: 9781629608013