Data Mining

Book description

New technologies have enabled us to collect massive amounts of data in many fields. However, our pace of discovering useful information and knowledge from these data falls far behind our pace of collecting the data. Data Mining: Theories, Algorithms, and Examples introduces and explains a comprehensive set of data mining algorithms from various dat

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Table of Contents
  6. Preface
  7. Acknowledgments
  8. Author
  9. Part I An Overview of Data Mining
    1. 1. Introduction to Data, Data Patterns, and Data Mining
      1. 1.1 Examples of Small Data Sets
      2. 1.2 Types of Data Variables
        1. 1.2.1 Attribute Variable versus Target Variable
        2. 1.2.2 Categorical Variable versus Numeric Variable
      3. 1.3 Data Patterns Learned through Data Mining
        1. 1.3.1 Classification and Prediction Patterns
        2. 1.3.2 Cluster and Association Patterns
        3. 1.3.3 Data Reduction Patterns
        4. 1.3.4 Outlier and Anomaly Patterns
        5. 1.3.5 Sequential and Temporal Patterns
      4. 1.4 Training Data and Test Data
      5. Exercises
  10. Part II Algorithms for Mining Classification and Prediction Patterns
    1. 2. Linear and Nonlinear Regression Models
      1. 2.1 Linear Regression Models
      2. 2.2 Least-Squares Method and Maximum Likelihood Method of Parameter Estimation
      3. 2.3 Nonlinear Regression Models and Parameter Estimation
      4. 2.4 Software and Applications
      5. Exercises
    2. 3. Naïve Bayes Classifier
      1. 3.1 Bayes Theorem
      2. 3.2 Classification Based on the Bayes Theorem and Naïve Bayes Classifier
      3. 3.3 Software and Applications
      4. Exercises
    3. 4. Decision and Regression Trees
      1. 4.1 Learning a Binary Decision Tree and Classifying Data Using a Decision Tree
        1. 4.1.1 Elements of a Decision Tree
        2. 4.1.2 Decision Tree with the Minimum Description Length
        3. 4.1.3 Split Selection Methods
        4. 4.1.4 Algorithm for the Top-Down Construction of a Decision Tree
        5. 4.1.5 Classifying Data Using a Decision Tree
      2. 4.2 Learning a Nonbinary Decision Tree
      3. 4.3 Handling Numeric and Missing Values of Attribute Variables
      4. 4.4 Handling a Numeric Target Variable and Constructing a Regression Tree
      5. 4.5 Advantages and Shortcomings of the Decision Tree Algorithm
      6. 4.6 Software and Applications
      7. Exercises
    4. 5. Artificial Neural Networks for Classification and Prediction
      1. 5.1 Processing Units of ANNs
      2. 5.2 Architectures of ANNs
      3. 5.3 Methods of Determining Connection Weights for a Perceptron
        1. 5.3.1 Perceptron
        2. 5.3.2 Properties of a Processing Unit
        3. 5.3.3 Graphical Method of Determining Connection Weights and Biases
        4. 5.3.4 Learning Method of Determining Connection Weights and Biases
        5. 5.3.5 Limitation of a Perceptron
      4. 5.4 Back-Propagation Learning Method for a Multilayer Feedforward ANN
      5. 5.5 Empirical Selection of an ANN Architecture for a Good Fit to Data
      6. 5.6 Software and Applications
      7. Exercises
    5. 6. Support Vector Machines
      1. 6.1 Theoretical Foundation for Formulating and Solving an Optimization Problem to Learn a Classification Function
      2. 6.2 SVM Formulation for a Linear Classifier and a Linearly Separable Problem
      3. 6.3 Geometric Interpretation of the SVM Formulation for the Linear Classifier
      4. 6.4 Solution of the Quadratic Programming Problem for a Linear Classifier
      5. 6.5 SVM Formulation for a Linear Classifier and a Nonlinearly Separable Problem
      6. 6.6 SVM Formulation for a Nonlinear Classifier and a Nonlinearly Separable Problem
      7. 6.7 Methods of Using SVM for Multi-Class Classification Problems
      8. 6.8 Comparison of ANN and SVM
      9. 6.9 Software and Applications
      10. Exercises
    6. 7. k-Nearest Neighbor Classifier and Supervised Clustering
      1. 7.1 k-Nearest Neighbor Classifier
      2. 7.2 Supervised Clustering
      3. 7.3 Software and Applications
      4. Exercises
  11. Part III Algorithms for Mining Cluster and Association Patterns
    1. 8. Hierarchical Clustering
      1. 8.1 Procedure of Agglomerative Hierarchical Clustering
      2. 8.2 Methods of Determining the Distance between Two Clusters
      3. 8.3 Illustration of the Hierarchical Clustering Procedure
      4. 8.4 Nonmonotonic Tree of Hierarchical Clustering
      5. 8.5 Software and Applications
      6. Exercises
    2. 9. K-Means Clustering and Density-Based Clustering
      1. 9.1 K-Means Clustering
      2. 9.2 Density-Based Clustering
      3. 9.3 Software and Applications
      4. Exercises
    3. 10. Self-Organizing Map
      1. 10.1 Algorithm of Self-Organizing Map
      2. 10.2 Software and Applications
      3. Exercises
    4. 11. Probability Distributions of Univariate Data
      1. 11.1 Probability Distribution of Univariate Data and Probability Distribution Characteristics of Various Data Patterns
      2. 11.2 Method of Distinguishing Four Probability Distributions
      3. 11.3 Software and Applications
      4. Exercises
    5. 12. Association Rules
      1. 12.1 Definition of Association Rules and Measures of Association
      2. 12.2 Association Rule Discovery
      3. 12.3 Software and Applications
      4. Exercises
    6. 13. Bayesian Network
      1. 13.1 Structure of a Bayesian Network and Probability Distributions of Variables
      2. 13.2 Probabilistic Inference
      3. 13.3 Learning of a Bayesian Network
      4. 13.4 Software and Applications
      5. Exercises
  12. Part IV Algorithms for Mining Data Reduction Patterns
    1. 14. Principal Component Analysis
      1. 14.1 Review of Multivariate Statistics
      2. 14.2 Review of Matrix Algebra
      3. 14.3 Principal Component Analysis
      4. 14.4 Software and Applications
      5. Exercises
    2. 15. Multidimensional Scaling
      1. 15.1 Algorithm of MDS
      2. 15.2 Number of Dimensions
      3. 15.3 INDSCALE for Weighted MDS
      4. 15.4 Software and Applications
      5. Exercises
  13. Part V Algorithms for Mining Outlier and Anomaly Patterns
    1. 16. Univariate Control Charts
      1. 16.1 Shewhart Control Charts
      2. 16.2 CUSUM Control Charts
      3. 16.3 EWMA Control Charts
      4. 16.4 Cuscore Control Charts
      5. 16.5 Receiver Operating Curve (ROC) for Evaluation and Comparison of Control Charts
      6. 16.6 Software and Applications
      7. Exercises
    2. 17. Multivariate Control Charts
      1. 17.1 Hotelling’s T2 Control Charts
      2. 17.2 Multivariate EWMA Control Charts
      3. 17.3 Chi-Square Control Charts
      4. 17.4 Applications
      5. Exercises
  14. Part VI Algorithms for Mining Sequential and Temporal Patterns
    1. 18. Autocorrelation and Time Series Analysis
      1. 18.1 Autocorrelation
      2. 18.2 Stationarity and Nonstationarity
      3. 18.3 ARMA Models of Stationary Series Data
      4. 18.4 ACF and PACF Characteristics of ARMA Models
      5. 18.5 Transformations of Nonstationary Series Data and ARIMA Models
      6. 18.6 Software and Applications
      7. Exercises
    2. 19. Markov Chain Models and Hidden Markov Models
      1. 19.1 Markov Chain Models
      2. 19.2 Hidden Markov Models
      3. 19.3 Learning Hidden Markov Models
      4. 19.4 Software and Applications
      5. Exercises
    3. 20. Wavelet Analysis
      1. 20.1 Definition of Wavelet
      2. 20.2 Wavelet Transform of Time Series Data
      3. 20.3 Reconstruction of Time Series Data from Wavelet Coefficients
      4. 20.4 Software and Applications
      5. Exercises
  15. References
  16. Index

Product information

  • Title: Data Mining
  • Author(s): Nong Ye
  • Release date: July 2013
  • Publisher(s): CRC Press
  • ISBN: 9781482219388