Book description
The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before.
This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will "learn data mining by doing data mining". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining.
The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization
Offers extensive coverage of the R statistical programming language
Contains 280 end-of-chapter exercises
Includes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book
Table of contents
-
Preface
- What is Data Mining?
- Why is This Book Needed?
- What's New for the Second Edition?
- Danger! Data Mining is Easy to Do Badly
- “White Box” Approach: Understanding the Underlying Algorithmic and Model Structures
- Data Mining as a Process
- Graphical Approach, Emphasizing Exploratory Data Analysis
- How The Book is Structured
- Acknowledgments
- Chapter 1: An Introduction to Data Mining
-
Chapter 2: Data Preprocessing
- 2.1 Why do We Need to Preprocess the Data?
- 2.2 Data Cleaning
- 2.3 Handling Missing Data
- 2.4 Identifying Misclassifications
- 2.5 Graphical Methods for Identifying Outliers
- 2.6 Measures of Center and Spread
- 2.7 Data Transformation
- 2.8 Min-Max Normalization
- 2.9 Z-Score Standardization
- 2.10 Decimal Scaling
- 2.11 Transformations to Achieve Normality
- 2.12 Numerical Methods for Identifying Outliers
- 2.13 Flag Variables
- 2.14 Transforming Categorical Variables into Numerical Variables
- 2.15 Binning Numerical Variables
- 2.16 Reclassifying Categorical Variables
- 2.17 Adding an Index Field
- 2.18 Removing Variables that are Not Useful
- 2.19 Variables that Should Probably Not Be Removed
- 2.20 Removal of Duplicate Records
- 2.21 A Word About Id Fields
- References
- Exercises
- Hands-On Analysis
- Notes
-
Chapter 3: Exploratory Data Analysis
- 3.1 Hypothesis Testing Versus Exploratory Data Analysis
- 3.2 Getting to Know the Data Set
- 3.3 Exploring Categorical Variables
- 3.4 Exploring Numeric Variables
- 3.5 Exploring Multivariate Relationships
- 3.6 Selecting Interesting Subsets of the Data for Further Investigation
- 3.7 Using EDA to Uncover Anomalous Fields
- 3.8 Binning Based on Predictive Value
- 3.9 Deriving New Variables: Flag Variables
- 3.10 Deriving New Variables: Numerical Variables
- 3.11 Using EDA to Investigate Correlated Predictor Variables
- 3.12 Summary
- Reference
- Exercises
- Hands-On Analysis
- Note
-
Chapter 4: Univariate Statistical Analysis
- 4.1 Data Mining Tasks in Discovering Knowledge in Data
- 4.2 Statistical Approaches to Estimation and Prediction
- 4.3 Statistical Inference
- 4.4 How Confident are We in Our Estimates?
- 4.5 Confidence Interval Estimation of the Mean
- 4.6 How to Reduce the Margin of Error
- 4.7 Confidence Interval Estimation of the Proportion
- 4.8 Hypothesis Testing for the Mean
- 4.9 Assessing the Strength of Evidence Against the Null Hypothesis
- 4.10 Using Confidence Intervals to Perform Hypothesis Tests
- 4.11 Hypothesis Testing for the Proportion
- Reference
- Exercises
-
Chapter 5: Multivariate Statistics
- 5.1 Two-Sample t-Test for Difference in Means
- 5.2 Two-Sample Z-Test for Difference in Proportions
- 5.3 Test for Homogeneity of Proportions
- 5.4 Chi-Square Test for Goodness of Fit of Multinomial Data
- 5.5 Analysis of Variance
- 5.6 Regression Analysis
- 5.7 Hypothesis Testing in Regression
- 5.8 Measuring the Quality of a Regression Model
- 5.9 Dangers of Extrapolation
- 5.10 Confidence Intervals for the Mean Value of y Given x
- 5.11 Prediction Intervals for a Randomly Chosen Value of y Given x
- 5.12 Multiple Regression
- 5.13 Verifying Model Assumptions
- Reference
- Exercises
- Hands-On Analysis
- Note
- Chapter 6: Preparing to Model the Data
-
Chapter 7: k-Nearest Neighbor Algorithm
- 7.1 Classification Task
- 7.2 k-Nearest Neighbor Algorithm
- 7.3 Distance Function
- 7.4 Combination Function
- 7.5 Quantifying Attribute Relevance: Stretching the Axes
- 7.6 Database Considerations
- 7.7 k-Nearest Neighbor Algorithm for Estimation and Prediction
- 7.8 Choosing k
- 7.9 Application of k-Nearest Neighbor Algorithm Using IBM/SPSS Modeler
- Exercises
- Hands-On Analysis
- Chapter 8: Decision Trees
-
Chapter 9: Neural Networks
- 9.1 Input and Output Encoding
- 9.2 Neural Networks for Estimation and Prediction
- 9.3 Simple Example of a Neural Network
- 9.4 Sigmoid Activation Function
- 9.5 Back-Propagation
- 9.6 Termination Criteria
- 9.7 Learning Rate
- 9.8 Momentum Term
- 9.9 Sensitivity Analysis
- 9.10 Application of Neural Network Modeling
- References
- Exercises
- Hands-On Analysis
-
Chapter 10: Hierarchical and k-Means Clustering
- 10.1 The Clustering Task
- 10.2 Hierarchical Clustering Methods
- 10.3 Single-Linkage Clustering
- 10.4 Complete-Linkage Clustering
- 10.5 k-Means Clustering
- 10.6 Example of k-Means Clustering at Work
- 10.7 Behavior of MSB, MSE, and PSEUDO-F as the k-Means Algorithm Proceeds
- 10.8 Application of k-Means Clustering Using SAS Enterprise Miner
- 10.9 Using Cluster Membership to Predict Churn
- References
- Exercises
- Hands-On Analysis
- Note
-
Chapter 11: Kohonen Networks
- 11.1 Self-Organizing Maps
- 11.2 Kohonen Networks
- 11.3 Example of a Kohonen Network Study
- 11.4 Cluster Validity
- 11.5 Application of Clustering Using Kohonen Networks
- 11.6 Interpreting the Clusters
- 11.7 Using Cluster Membership as Input to Downstream Data Mining Models
- References
- Exercises
- Hands-On Analysis
-
Chapter 12: Association Rules
- 12.1 Affinity Analysis and Market Basket Analysis
- 12.2 Support, Confidence, Frequent Itemsets, and the a Priori Property
- 12.3 How Does the a Priori Algorithm Work?
- 12.4 Extension from Flag Data to General Categorical Data
- 12.5 Information-Theoretic Approach: Generalized Rule Induction Method
- 12.6 Association Rules are Easy to do Badly
- 12.7 How can we Measure the Usefulness of Association Rules?
- 12.8 Do Association Rules Represent Supervised or Unsupervised Learning?
- 12.9 Local Patterns Versus Global Models
- References
- Exercises
- Hands-On Analysis
- Chapter 13: Imputation of Missing Data
-
Chapter 14: Model Evaluation Techniques
- 14.1 Model Evaluation Techniques for the Description Task
- 14.2 Model Evaluation Techniques for the Estimation and Prediction Tasks
- 14.3 Model Evaluation Techniques for the Classification Task
- 14.4 Error Rate, False Positives, and False Negatives
- 14.5 Sensitivity and Specificity
- 14.6 Misclassification Cost Adjustment to Reflect Real-World Concerns
- 14.7 Decision Cost/Benefit Analysis
- 14.8 Lift Charts and Gains Charts
- 14.9 Interweaving Model Evaluation with Model Building
- 14.10 Confluence of Results: Applying a Suite of Models
- Reference
- Exercises
- Hands-On Analysis
- Notes
- Appendix: Data Summarization and Visualization
- Index
- End User License Agreement
Product information
- Title: Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition
- Author(s):
- Release date: July 2014
- Publisher(s): Wiley
- ISBN: 9780470908747
You might also like
book
Data Mining and Machine Learning Applications
DATA MINING AND MACHINE LEARNING APPLICATIONS The book elaborates in detail on the current needs of …
book
Data Mining and Predictive Analytics, 2nd Edition
Learn methods of data analysis and their application to real-world data sets This updated second edition …
book
Data Mining For Dummies
Delve into your data for the key to success Data mining is quickly becoming integral to …
book
Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd Edition
Praise for the First Edition "...a well-written book on data analysis and data mining that provides …