Book description
The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the everincreasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before.
This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will "learn data mining by doing data mining". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining.
The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization
Offers extensive coverage of the R statistical programming language
Contains 280 endofchapter exercises
Includes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book
Table of contents

Preface
 What is Data Mining?
 Why is This Book Needed?
 What's New for the Second Edition?
 Danger! Data Mining is Easy to Do Badly
 “White Box” Approach: Understanding the Underlying Algorithmic and Model Structures
 Data Mining as a Process
 Graphical Approach, Emphasizing Exploratory Data Analysis
 How The Book is Structured
 Acknowledgments
 Chapter 1: An Introduction to Data Mining

Chapter 2: Data Preprocessing
 2.1 Why do We Need to Preprocess the Data?
 2.2 Data Cleaning
 2.3 Handling Missing Data
 2.4 Identifying Misclassifications
 2.5 Graphical Methods for Identifying Outliers
 2.6 Measures of Center and Spread
 2.7 Data Transformation
 2.8 MinMax Normalization
 2.9 ZScore Standardization
 2.10 Decimal Scaling
 2.11 Transformations to Achieve Normality
 2.12 Numerical Methods for Identifying Outliers
 2.13 Flag Variables
 2.14 Transforming Categorical Variables into Numerical Variables
 2.15 Binning Numerical Variables
 2.16 Reclassifying Categorical Variables
 2.17 Adding an Index Field
 2.18 Removing Variables that are Not Useful
 2.19 Variables that Should Probably Not Be Removed
 2.20 Removal of Duplicate Records
 2.21 A Word About Id Fields
 References
 Exercises
 HandsOn Analysis
 Notes

Chapter 3: Exploratory Data Analysis
 3.1 Hypothesis Testing Versus Exploratory Data Analysis
 3.2 Getting to Know the Data Set
 3.3 Exploring Categorical Variables
 3.4 Exploring Numeric Variables
 3.5 Exploring Multivariate Relationships
 3.6 Selecting Interesting Subsets of the Data for Further Investigation
 3.7 Using EDA to Uncover Anomalous Fields
 3.8 Binning Based on Predictive Value
 3.9 Deriving New Variables: Flag Variables
 3.10 Deriving New Variables: Numerical Variables
 3.11 Using EDA to Investigate Correlated Predictor Variables
 3.12 Summary
 Reference
 Exercises
 HandsOn Analysis
 Note

Chapter 4: Univariate Statistical Analysis
 4.1 Data Mining Tasks in Discovering Knowledge in Data
 4.2 Statistical Approaches to Estimation and Prediction
 4.3 Statistical Inference
 4.4 How Confident are We in Our Estimates?
 4.5 Confidence Interval Estimation of the Mean
 4.6 How to Reduce the Margin of Error
 4.7 Confidence Interval Estimation of the Proportion
 4.8 Hypothesis Testing for the Mean
 4.9 Assessing the Strength of Evidence Against the Null Hypothesis
 4.10 Using Confidence Intervals to Perform Hypothesis Tests
 4.11 Hypothesis Testing for the Proportion
 Reference
 Exercises

Chapter 5: Multivariate Statistics
 5.1 TwoSample tTest for Difference in Means
 5.2 TwoSample ZTest for Difference in Proportions
 5.3 Test for Homogeneity of Proportions
 5.4 ChiSquare Test for Goodness of Fit of Multinomial Data
 5.5 Analysis of Variance
 5.6 Regression Analysis
 5.7 Hypothesis Testing in Regression
 5.8 Measuring the Quality of a Regression Model
 5.9 Dangers of Extrapolation
 5.10 Confidence Intervals for the Mean Value of y Given x
 5.11 Prediction Intervals for a Randomly Chosen Value of y Given x
 5.12 Multiple Regression
 5.13 Verifying Model Assumptions
 Reference
 Exercises
 HandsOn Analysis
 Note
 Chapter 6: Preparing to Model the Data

Chapter 7: kNearest Neighbor Algorithm
 7.1 Classification Task
 7.2 kNearest Neighbor Algorithm
 7.3 Distance Function
 7.4 Combination Function
 7.5 Quantifying Attribute Relevance: Stretching the Axes
 7.6 Database Considerations
 7.7 kNearest Neighbor Algorithm for Estimation and Prediction
 7.8 Choosing k
 7.9 Application of kNearest Neighbor Algorithm Using IBM/SPSS Modeler
 Exercises
 HandsOn Analysis
 Chapter 8: Decision Trees

Chapter 9: Neural Networks
 9.1 Input and Output Encoding
 9.2 Neural Networks for Estimation and Prediction
 9.3 Simple Example of a Neural Network
 9.4 Sigmoid Activation Function
 9.5 BackPropagation
 9.6 Termination Criteria
 9.7 Learning Rate
 9.8 Momentum Term
 9.9 Sensitivity Analysis
 9.10 Application of Neural Network Modeling
 References
 Exercises
 HandsOn Analysis

Chapter 10: Hierarchical and kMeans Clustering
 10.1 The Clustering Task
 10.2 Hierarchical Clustering Methods
 10.3 SingleLinkage Clustering
 10.4 CompleteLinkage Clustering
 10.5 kMeans Clustering
 10.6 Example of kMeans Clustering at Work
 10.7 Behavior of MSB, MSE, and PSEUDOF as the kMeans Algorithm Proceeds
 10.8 Application of kMeans Clustering Using SAS Enterprise Miner
 10.9 Using Cluster Membership to Predict Churn
 References
 Exercises
 HandsOn Analysis
 Note

Chapter 11: Kohonen Networks
 11.1 SelfOrganizing Maps
 11.2 Kohonen Networks
 11.3 Example of a Kohonen Network Study
 11.4 Cluster Validity
 11.5 Application of Clustering Using Kohonen Networks
 11.6 Interpreting the Clusters
 11.7 Using Cluster Membership as Input to Downstream Data Mining Models
 References
 Exercises
 HandsOn Analysis

Chapter 12: Association Rules
 12.1 Affinity Analysis and Market Basket Analysis
 12.2 Support, Confidence, Frequent Itemsets, and the a Priori Property
 12.3 How Does the a Priori Algorithm Work?
 12.4 Extension from Flag Data to General Categorical Data
 12.5 InformationTheoretic Approach: Generalized Rule Induction Method
 12.6 Association Rules are Easy to do Badly
 12.7 How can we Measure the Usefulness of Association Rules?
 12.8 Do Association Rules Represent Supervised or Unsupervised Learning?
 12.9 Local Patterns Versus Global Models
 References
 Exercises
 HandsOn Analysis
 Chapter 13: Imputation of Missing Data

Chapter 14: Model Evaluation Techniques
 14.1 Model Evaluation Techniques for the Description Task
 14.2 Model Evaluation Techniques for the Estimation and Prediction Tasks
 14.3 Model Evaluation Techniques for the Classification Task
 14.4 Error Rate, False Positives, and False Negatives
 14.5 Sensitivity and Specificity
 14.6 Misclassification Cost Adjustment to Reflect RealWorld Concerns
 14.7 Decision Cost/Benefit Analysis
 14.8 Lift Charts and Gains Charts
 14.9 Interweaving Model Evaluation with Model Building
 14.10 Confluence of Results: Applying a Suite of Models
 Reference
 Exercises
 HandsOn Analysis
 Notes
 Appendix: Data Summarization and Visualization
 Index
 End User License Agreement
Product information
 Title: Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition
 Author(s):
 Release date: July 2014
 Publisher(s): Wiley
 ISBN: 9780470908747
You might also like
book
40 Algorithms Every Programmer Should Know
Learn algorithms for solving classic computer science problems with this concise guide covering everything from fundamental …
book
Analytical Skills for AI and Data Science
While several marketleading companies have successfully transformed their business models by following data and AIdriven paths, …
book
Introduction to Probability
Developed from celebrated Harvard statistics lectures, Introduction to Probability provides essential language and tools for understanding …
book
Machine Learning Algorithms  Second Edition
An easytofollow, stepbystep guide for getting to grips with the realworld application of machine learning algorithms …