O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Classification

Book Description

Comprehensive Coverage of the Entire Area of Classification

Research on the problem of classification tends to be fragmented across such areas as pattern recognition, database, data mining, and machine learning. Addressing the work of these different communities in a unified way, Data Classification: Algorithms and Applications explores the underlying algorithms of classification as well as applications of classification in a variety of problem domains, including text, multimedia, social network, and biological data.

This comprehensive book focuses on three primary aspects of data classification:

  • Methods: The book first describes common techniques used for classification, including probabilistic methods, decision trees, rule-based methods, instance-based methods, support vector machine methods, and neural networks.
  • Domains: The book then examines specific methods used for data domains such as multimedia, text, time-series, network, discrete sequence, and uncertain data. It also covers large data sets and data streams due to the recent importance of the big data paradigm.
  • Variations: The book concludes with insight on variations of the classification process. It discusses ensembles, rare-class learning, distance function learning, active learning, visual learning, transfer learning, and semi-supervised learning as well as evaluation aspects of classifiers.

Table of Contents

  1. Preliminaries
  2. Series
  3. Dedication
  4. Editor Biography
  5. Contributors
  6. Preface
  7. Chapter 1 An Introduction to Data Classification
    1. 1.1 Introduction
    2. 1.2 Common Techniques in Data Classification
      1. 1.2.1 Feature Selection Methods
      2. 1.2.2 Probabilistic Methods
      3. 1.2.3 Decision Trees
      4. 1.2.4 Rule-Based Methods
      5. 1.2.5 Instance-Based Learning
      6. 1.2.6 SVM Classifiers
      7. 1.2.7 Neural Networks
    3. 1.3 Handing Different Data Types
      1. 1.3.1 Large Scale Data: Big Data and Data Streams
        1. 1.3.1.1 Data Streams
        2. 1.3.1.2 The Big Data Framework
      2. 1.3.2 Text Classification
      3. 1.3.3 Multimedia Classification
      4. 1.3.4 Time Series and Sequence Data Classification
      5. 1.3.5 Network Data Classification
      6. 1.3.6 Uncertain Data Classification
    4. 1.4 Variations on Data Classification
      1. 1.4.1 Rare Class Learning
      2. 1.4.2 Distance Function Learning
      3. 1.4.3 Ensemble Learning for Data Classification
      4. 1.4.4 Enhancing Classification Methods with Additional Data
        1. 1.4.4.1 Semi-Supervised Learning
        2. 1.4.4.2 Transfer Learning
      5. 1.4.5 Incorporating Human Feedback
        1. 1.4.5.1 Active Learning
        2. 1.4.5.2 Visual Learning
      6. 1.4.6 Evaluating Classification Algorithms
    5. 1.5 Discussion and Conclusions
    6. Bibliography
      1. Figure 1.1
      2. Figure 1.2
      3. Figure 1.3
      4. Figure 1.4
      5. Figure 1.5
      1. Table 1.1
  8. Chapter 2 Feature Selection for Classification: A Review
    1. 2.1 Introduction
      1. 2.1.1 Data Classification
      2. 2.1.2 Feature Selection
      3. 2.1.3 Feature Selection for Classification
    2. 2.2 Algorithms for Flat Features
      1. 2.2.1 Filter Models
      2. 2.2.2 Wrapper Models
      3. 2.2.3 Embedded Models
    3. 2.3 Algorithms for Structured Features
      1. 2.3.1 Features with Group Structure
      2. 2.3.2 Features with Tree Structure
      3. 2.3.3 Features with Graph Structure
    4. 2.4 Algorithms for Streaming Features
      1. 2.4.1 The Grafting Algorithm
      2. 2.4.2 The Alpha-Investing Algorithm
      3. 2.4.3 The Online Streaming Feature Selection Algorithm
    5. 2.5 Discussions and Challenges
      1. 2.5.1 Scalability
      2. 2.5.2 Stability
      3. 2.5.3 Linked Data
    6. Acknowledgments
    7. Bibliography
      1. Figure 2.1
      2. Figure 2.2
      3. Figure 2.3
      4. Figure 2.4
      5. Figure 2.5
      6. Figure 2.6
      7. Figure 2.7
      8. Figure 2.8
      9. Figure 2.9
      10. Figure 2.10
  9. Chapter 3 Probabilistic Models for Classification
    1. 3.1 Introduction
    2. 3.2 Naive Bayes Classification
      1. 3.2.1 Bayes’ Theorem and Preliminary
      2. 3.2.2 Naive Bayes Classifier
      3. 3.2.3 Maximum-Likelihood Estimates for Naive Bayes Models
      4. 3.2.4 Applications
    3. 3.3 Logistic Regression Classification
      1. 3.3.1 Logistic Regression
      2. 3.3.2 Parameters Estimation for Logistic Regression
      3. 3.3.3 Regularization in Logistic Regression
      4. 3.3.4 Applications
    4. 3.4 Probabilistic Graphical Models for Classification
      1. 3.4.1 Bayesian Networks
        1. 3.4.1.1 Bayesian Network Construction
        2. 3.4.1.2 Inference in a Bayesian Network
        3. 3.4.1.3 Learning Bayesian Networks
      2. 3.4.2 Hidden Markov Models
        1. 3.4.2.1 The Inference and Learning Algorithms
      3. 3.4.3 Markov Random Fields
        1. 3.4.3.1 Conditional Independence
        2. 3.4.3.2 Clique Factorization
        3. 3.4.3.3 The Inference and Learning Algorithms
      4. 3.4.4 Conditional Random Fields
        1. 3.4.4.1 The Learning Algorithms
    5. 3.5 Summary
    6. Bibliography
      1. Figure 3.1
      2. Figure 3.2
      3. Figure 3.3
      4. Figure 3.4
  10. Chapter 4 Decision Trees: Theory and Algorithms
    1. 4.1 Introduction
    2. 4.2 Top-Down Decision Tree Induction
      1. 4.2.1 Node Splitting
      2. 4.2.2 Tree Pruning
    3. 4.3 Case Studies with C4.5 and CART
      1. 4.3.1 Splitting Criteria
      2. 4.3.2 Stopping Conditions
      3. 4.3.3 Pruning Strategy
      4. 4.3.4 Handling Unknown Values: Induction and Prediction
      5. 4.3.5 Other Issues: Windowing and Multivariate Criteria
    4. 4.4 Scalable Decision Tree Construction
      1. 4.4.1 RainForest-Based Approach
      2. 4.4.2 SPIES Approach
      3. 4.4.3 Parallel Decision Tree Construction
    5. 4.5 Incremental Decision Tree Induction
      1. 4.5.1 ID3 Family
      2. 4.5.2 VFDT Family
      3. 4.5.3 Ensemble Method for Streaming Data
    6. 4.6 Summary
    7. Bibliography
      1. Figure 4.1
      2. Figure 4.2
      3. Figure 4.3
      4. Figure 4.4
      5. Figure 4.5
      6. Figure 4.6
      1. Table 4.1
      2. Table 4.2
      3. Table 4.3
  11. Chapter 5 Rule-Based Classification
    1. 5.1 Introduction
    2. 5.2 Rule Induction
      1. 5.2.1 Two Algorithms for Rule Induction
        1. 5.2.1.1 CN2 Induction Algorithm (Ordered Rules)
        2. 5.2.1.2 RIPPER Algorithm and Its Variations (Ordered Classes)
      2. 5.2.2 Learn One Rule in Rule Learning
    3. 5.3 Classification Based on Association Rule Mining
      1. 5.3.1 Association Rule Mining
        1. 5.3.1.1 Definitions of Association Rules, Support, and Confidence
        2. 5.3.1.2 The Introduction of Apriori Algorithm
      2. 5.3.2 Mining Class Association Rules
      3. 5.3.3 Classification Based on Associations
        1. 5.3.3.1 Additional Discussion for CARs Mining
        2. 5.3.3.2 Building a Classifier Using CARs
      4. 5.3.4 Other Techniques for Association Rule-Based Classification
    4. 5.4 Applications
      1. 5.4.1 Text Categorization
      2. 5.4.2 Intrusion Detection
      3. 5.4.3 Using Class Association Rules for Diagnostic Data Mining
      4. 5.4.4 Gene Expression Data Analysis
    5. 5.5 Discussion and Conclusion
    6. Bibliography
      1. Table 5.1
      2. Table 5.2
      3. Table 5.3
  12. Chapter 6 Instance-Based Learning: A Survey
    1. 6.1 Introduction
    2. 6.2 Instance-Based Learning Framework
    3. 6.3 The Nearest Neighbor Classifier
      1. 6.3.1 Handling Symbolic Attributes
      2. 6.3.2 Distance-Weighted Nearest Neighbor Methods
      3. 6.3.3 Local Distance Scaling
      4. 6.3.4 Attribute-Weighted Nearest Neighbor Methods
      5. 6.3.5 Locally Adaptive Nearest Neighbor Classifier
      6. 6.3.6 Combining with Ensemble Methods
      7. 6.3.7 Multi-Label Learning
    4. 6.4 Lazy SVM Classification
    5. 6.5 Locally Weighted Regression
    6. 6.6 Lazy Naive Bayes
    7. 6.7 Lazy Decision Trees
    8. 6.8 Rule-Based Classification
    9. 6.9 Radial Basis Function Networks: Leveraging Neural Networks for Instance-Based Learning
    10. 6.10 Lazy Methods for Diagnostic and Visual Classification
    11. 6.11 Conclusions and Summary
    12. Bibliography
      1. Figure 6.1
      2. Figure 6.2
      3. Figure 6.3
      4. Figure 6.4
      5. Figure 6.5
  13. Chapter 7 Support Vector Machines
    1. 7.1 Introduction
    2. 7.2 The Maximum Margin Perspective
    3. 7.3 The Regularization Perspective
    4. 7.4 The Support Vector Perspective
    5. 7.5 Kernel Tricks
    6. 7.6 Solvers and Algorithms
    7. 7.7 Multiclass Strategies
    8. 7.8 Conclusion
    9. Bibliography
      1. Figure 7.1
      2. Figure 7.2
      3. Figure 7.3
      4. Figure 7.4
      5. Figure 7.5
      6. Figure 7.6
      7. Figure 7.7
      8. Figure 7.8
      9. Figure 7.9
  14. Chapter 8 Neural Networks: A Review
    1. 8.1 Introduction
    2. 8.2 Fundamental Concepts
      1. 8.2.1 Mathematical Model of a Neuron
      2. 8.2.2 Types of Units
        1. 8.2.2.1 McCullough Pitts Binary Threshold Unit
        2. 8.2.2.2 Linear Unit
        3. 8.2.2.3 Linear Threshold Unit
        4. 8.2.2.4 Sigmoidal Unit
        5. 8.2.2.5 Distance Unit
        6. 8.2.2.6 Radial Basis Unit
        7. 8.2.2.7 Polynomial Unit
      3. 8.2.3 Network Topology
        1. 8.2.3.1 Layered Network
        2. 8.2.3.2 Networks with Feedback
        3. 8.2.3.3 Modular Networks
      4. 8.2.4 Computation and Knowledge Representation
      5. 8.2.5 Learning
        1. 8.2.5.1 Hebbian Rule
        2. 8.2.5.2 The Delta Rule
    3. 8.3 Single-Layer Neural Network
      1. 8.3.1 The Single-Layer Perceptron
        1. 8.3.1.1 Perceptron Criterion
        2. 8.3.1.2 Multi-Class Perceptrons
        3. 8.3.1.3 Perceptron Enhancements
      2. 8.3.2 Adaline
        1. 8.3.2.1 Two-Class Adaline
        2. 8.3.2.2 Multi-Class Adaline
      3. 8.3.3 Learning Vector Quantization (LVQ)
        1. 8.3.3.1 LVQ1 Training
        2. 8.3.3.2 LVQ2 Training
        3. 8.3.3.3 Application and Limitations
    4. 8.4 Kernel Neural Network
      1. 8.4.1 Radial Basis Function Network
      2. 8.4.2 RBFN Training
        1. 8.4.2.1 Using Training Samples as Centers
        2. 8.4.2.2 Random Selection of Centers
        3. 8.4.2.3 Unsupervised Selection of Centers
        4. 8.4.2.4 Supervised Estimation of Centers
        5. 8.4.2.5 Linear Optimization of Weights
        6. 8.4.2.6 Gradient Descent and Enhancements
      3. 8.4.3 RBF Applications
    5. 8.5 Multi-Layer Feedforward Network
      1. 8.5.1 MLP Architecture for Classification
        1. 8.5.1.1 Two-Class Problems
        2. 8.5.1.2 Multi-Class Problems
        3. 8.5.1.3 Forward Propagation
      2. 8.5.2 Error Metrics
        1. 8.5.2.1 Mean Square Error (MSE)
        2. 8.5.2.2 Cross-Entropy (CE)
        3. 8.5.2.3 Minimum Classification Error (MCE)
      3. 8.5.3 Learning by Backpropagation
      4. 8.5.4 Enhancing Backpropagation
        1. 8.5.4.1 Backpropagation with Momentum
        2. 8.5.4.2 Delta-Bar-Delta
        3. 8.5.4.3 Rprop Algorithm
        4. 8.5.4.4 Quick-Prop
      5. 8.5.5 Generalization Issues
      6. 8.5.6 Model Selection
    6. 8.6 Deep Neural Networks
      1. 8.6.1 Use of Prior Knowledge
      2. 8.6.2 Layer-Wise Greedy Training
        1. 8.6.2.1 Deep Belief Networks (DBNs)
        2. 8.6.2.2 Stack Auto-Encoder
      3. 8.6.3 Limits and Applications
    7. 8.7 Summary
    8. Acknowledgements
    9. Bibliography
      1. Figure 8.1
      2. Figure 8.2
      3. Figure 8.3
      4. Figure 8.4
      1. Table 8.1
  15. Chapter 9 A Survey of Stream Classification Algorithms
    1. 9.1 Introduction
    2. 9.2 Generic Stream Classification Algorithms
      1. 9.2.1 Decision Trees for Data Streams
      2. 9.2.2 Rule-Based Methods for Data Streams
      3. 9.2.3 Nearest Neighbor Methods for Data Streams
      4. 9.2.4 SVM Methods for Data Streams
      5. 9.2.5 Neural Network Classifiers for Data Streams
      6. 9.2.6 Ensemble Methods for Data Streams
    3. 9.3 Rare Class Stream Classification
      1. 9.3.1 Detecting Rare Classes
      2. 9.3.2 Detecting Novel Classes
      3. 9.3.3 Detecting Infrequently Recurring Classes
    4. 9.4 Discrete Attributes: The Massive Domain Scenario
    5. 9.5 Other Data Domains
      1. 9.5.1 Text Streams
      2. 9.5.2 Graph Streams
      3. 9.5.3 Uncertain Data Streams
    6. 9.6 Conclusions and Summary
    7. Bibliography
      1. Figure 9.1
      2. Figure 9.2
      1. Table 9.1
  16. Chapter 10 Big Data Classification
    1. 10.1 Introduction
    2. 10.2 Scale-Up on a Single Machine
      1. 10.2.1 Background
      2. 10.2.2 SVMPerf
      3. 10.2.3 Pegasos
      4. 10.2.4 Bundle Methods
    3. 10.3 Scale-Up by Parallelism
      1. 10.3.1 Parallel Decision Trees
      2. 10.3.2 Parallel SVMs
      3. 10.3.3 MRM-ML
      4. 10.3.4 SystemML
    4. 10.4 Conclusion
    5. Bibliography
  17. Chapter 11 Text Classification
    1. 11.1 Introduction
    2. 11.2 Feature Selection for Text Classification
      1. 11.2.1 Gini Index
      2. 11.2.2 Information Gain
      3. 11.2.3 Mutual Information
      4. 11.2.4 χ2-Statistic
      5. 11.2.5 Feature Transformation Methods: Unsupervised and Supervised LSI
      6. 11.2.6 Supervised Clustering for Dimensionality Reduction
      7. 11.2.7 Linear Discriminant Analysis
      8. 11.2.8 Generalized Singular Value Decomposition
      9. 11.2.9 Interaction of Feature Selection with Classification
    3. 11.3 Decision Tree Classifiers
    4. 11.4 Rule-Based Classifiers
    5. 11.5 Probabilistic and Naive Bayes Classifiers
      1. 11.5.1 Bernoulli Multivariate Model
      2. 11.5.2 Multinomial Distribution
      3. 11.5.3 Mixture Modeling for Text Classification
    6. 11.6 Linear Classifiers
      1. 11.6.1 SVM Classifiers
      2. 11.6.2 Regression-Based Classifiers
      3. 11.6.3 Neural Network Classifiers
      4. 11.6.4 Some Observations about Linear Classifiers
    7. 11.7 Proximity-Based Classifiers
    8. 11.8 Classification of Linked and Web Data
    9. 11.9 Meta-Algorithms for Text Classification
      1. 11.9.1 Classifier Ensemble Learning
      2. 11.9.2 Data Centered Methods: Boosting and Bagging
      3. 11.9.3 Optimizing Specific Measures of Accuracy
    10. 11.10 Leveraging Additional Training Data
      1. 11.10.1 Semi-Supervised Learning
      2. 11.10.2 Transfer Learning
      3. 11.10.3 Active Learning
    11. 11.11 Conclusions and Summary
    12. Bibliography
      1. Figure 11.1
      2. Figure 11.2
      3. Figure 11.3
      4. Figure 11.4
  18. Chapter 12 Multimedia Classification
    1. 12.1 Introduction
      1. 12.1.1 Overview
    2. 12.2 Feature Extraction and Data Pre-Processing
      1. 12.2.1 Text Features
      2. 12.2.2 Image Features
      3. 12.2.3 Audio Features
      4. 12.2.4 Video Features
    3. 12.3 Audio Visual Fusion
      1. 12.3.1 Fusion Methods
      2. 12.3.2 Audio Visual Speech Recognition
        1. 12.3.2.1 Visual Front End
        2. 12.3.2.2 Decision Fusion on HMM
      3. 12.3.3 Other Applications
    4. 12.4 Ontology-Based Classification and Inference
      1. 12.4.1 Popular Applied Ontology
      2. 12.4.2 Ontological Relations
        1. 12.4.2.1 Definition
        2. 12.4.2.2 Subclass Relation
        3. 12.4.2.3 Co-Occurrence Relation
        4. 12.4.2.4 Combination of the Two Relations
        5. 12.4.2.5 Inherently Used Ontology
    5. 12.5 Geographical Classification with Multimedia Data
      1. 12.5.1 Data Modalities
      2. 12.5.2 Challenges in Geographical Classification
      3. 12.5.3 Geo-Classification for Images
        1. 12.5.3.1 Classifiers
      4. 12.5.4 Geo-Classification for Web Videos
    6. 12.6 Conclusion
    7. Bibliography
      1. Figure 12.1
      2. Figure 12.2
      3. Figure 12.3
      4. Figure 12.4
      5. Figure 12.5
      6. Figure 12.6
      7. Figure 12.7
      8. Figure 12.8
      9. Figure 12.9
  19. Chapter 13 Time Series Data Classification
    1. 13.1 Introduction
    2. 13.2 Time Series Representation
    3. 13.3 Distance Measures
      1. 13.3.1 Lp-Norms
      2. 13.3.2 Dynamic Time Warping (DTW)
      3. 13.3.3 Edit Distance
      4. 13.3.4 Longest Common Subsequence (LCSS)
    4. 13.4 k-NN
      1. 13.4.1 Speeding up the k-NN
    5. 13.5 Support Vector Machines (SVMs)
    6. 13.6 Classification Trees
    7. 13.7 Model-Based Classification
    8. 13.8 Distributed Time Series Classification
    9. 13.9 Conclusion
    10. Acknowledgements
    11. Bibliography
  20. Chapter 14 Discrete Sequence Classification
    1. 14.1 Introduction
    2. 14.2 Background
      1. 14.2.1 Sequence
      2. 14.2.2 Sequence Classification
      3. 14.2.3 Frequent Sequential Patterns
      4. 14.2.4 n-Grams
    3. 14.3 Sequence Classification Methods
    4. 14.4 Feature-Based Classification
      1. 14.4.1 Filtering Method for Sequential Feature Selection
      2. 14.4.2 Pattern Mining Framework for Mining Sequential Features
      3. 14.4.3 A Wrapper-Based Method for Mining Sequential Features
    5. 14.5 Distance-Based Methods
        1. 14.5.0.1 Alignment-Based Distance
        2. 14.5.0.2 Keyword-Based Distance
        3. 14.5.0.3 Kernel-Based Similarity
        4. 14.5.0.4 Model-Based Similarity
        5. 14.5.0.5 Time Series Distance Metrics
    6. 14.6 Model-Based Method
    7. 14.7 Hybrid Methods
    8. 14.8 Non-Traditional Sequence Classification
      1. 14.8.1 Semi-Supervised Sequence Classification
      2. 14.8.2 Classification of Label Sequences
      3. 14.8.3 Classification of Sequence of Vector Data
    9. 14.9 Conclusions
    10. Bibliography
  21. Chapter 15 Collective Classification of Network Data
    1. 15.1 Introduction
    2. 15.2 Collective Classification Problem Definition
      1. 15.2.1 Inductive vs. Transductive Learning
      2. 15.2.2 Active Collective Classification
    3. 15.3 Iterative Methods
      1. 15.3.1 Label Propagation
      2. 15.3.2 Iterative Classification Algorithms
    4. 15.4 Graph-Based Regularization
    5. 15.5 Probabilistic Graphical Models
      1. 15.5.1 Directed Models
      2. 15.5.2 Undirected Models
      3. 15.5.3 Approximate Inference in Graphical Models
        1. 15.5.3.1 Gibbs Sampling
        2. 15.5.3.2 Loopy Belief Propagation (LBP)
    6. 15.6 Feature Construction
      1. 15.6.1 Data Graph
      2. 15.6.2 Relational Features
    7. 15.7 Applications of Collective Classification
    8. 15.8 Conclusion
    9. Acknowledgements
    10. Bibliography
      1. Figure 15.1
      2. Figure 15.2
  22. Chapter 16 Uncertain Data Classification
    1. 16.1 Introduction
    2. 16.2 Preliminaries
      1. 16.2.1 Data Uncertainty Models
      2. 16.2.2 Classification Framework
    3. 16.3 Classification Algorithms
      1. 16.3.1 Decision Trees
      2. 16.3.2 Rule-Based Classification
      3. 16.3.3 Associative Classification
      4. 16.3.4 Density-Based Classification
      5. 16.3.5 Nearest Neighbor-Based Classification
      6. 16.3.6 Support Vector Classification
      7. 16.3.7 Naive Bayes Classification
    4. 16.4 Conclusions
    5. Bibliography
      1. Figure 16.1
      2. Figure 16.2
      3. Figure 16.3
      4. Figure 16.4
      5. Figure 16.5
  23. Chapter 17 Rare Class Learning
    1. 17.1 Introduction
    2. 17.2 Rare Class Detection
      1. 17.2.1 Cost Sensitive Learning
        1. 17.2.1.1 MetaCost: A Relabeling Approach
        2. 17.2.1.2 Weighting Methods
        3. 17.2.1.3 Bayes Classifiers
        4. 17.2.1.4 Proximity-Based Classifiers
        5. 17.2.1.5 Rule-Based Classifiers
        6. 17.2.1.6 Decision Trees
        7. 17.2.1.7 SVM Classifier
      2. 17.2.2 Adaptive Re-Sampling
        1. 17.2.2.1 Relation between Weighting and Sampling
        2. 17.2.2.2 Synthetic Over-Sampling: SMOTE
        3. 17.2.2.3 One Class Learning with Positive Class
        4. 17.2.2.4 Ensemble Techniques
      3. 17.2.3 Boosting Methods
    3. 17.3 The Semi-Supervised Scenario: Positive and Unlabeled Data
      1. 17.3.1 Difficult Cases and One-Class Learning
    4. 17.4 The Semi-Supervised Scenario: Novel Class Detection
      1. 17.4.1 One Class Novelty Detection
      2. 17.4.2 Combining Novel Class Detection with Rare Class Detection
      3. 17.4.3 Online Novelty Detection
    5. 17.5 Human Supervision
    6. 17.6 Other Work
    7. 17.7 Conclusions and Summary
    8. Bibliography
      1. Figure 17.1
      2. Figure 17.2
  24. Chapter 18 Distance Metric Learning for Data Classification
    1. 18.1 Introduction
    2. 18.2 The Definition of Distance Metric Learning
    3. 18.3 Supervised Distance Metric Learning Algorithms
      1. 18.3.1 Linear Discriminant Analysis (LDA)
      2. 18.3.2 Margin Maximizing Discriminant Analysis (MMDA)
      3. 18.3.3 Learning with Side Information (LSI)
      4. 18.3.4 Relevant Component Analysis (RCA)
      5. 18.3.5 Information Theoretic Metric Learning (ITML)
      6. 18.3.6 Neighborhood Component Analysis (NCA)
      7. 18.3.7 Average Neighborhood Margin Maximization (ANMM)
      8. 18.3.8 Large Margin Nearest Neighbor Classifier (LMNN)
    4. 18.4 Advanced Topics
      1. 18.4.1 Semi-Supervised Metric Learning
        1. 18.4.1.1 Laplacian Regularized Metric Learning (LRML)
        2. 18.4.1.2 Constraint Margin Maximization (CMM)
      2. 18.4.2 Online Learning
        1. 18.4.2.1 Pseudo-Metric Online Learning Algorithm (POLA)
        2. 18.4.2.2 Online Information Theoretic Metric Learning (OITML)
    5. 18.5 Conclusions and Discussions
    6. Bibliography
      1. Table 18.1
      2. Table 18.2
  25. Chapter 19 Ensemble Learning
    1. 19.1 Introduction
    2. 19.2 Bayesian Methods
      1. 19.2.1 Bayes Optimal Classifier
      2. 19.2.2 Bayesian Model Averaging
      3. 19.2.3 Bayesian Model Combination
    3. 19.3 Bagging
      1. 19.3.1 General Idea
      2. 19.3.2 Random Forest
    4. 19.4 Boosting
      1. 19.4.1 General Boosting Procedure
      2. 19.4.2 AdaBoost
    5. 19.5 Stacking
      1. 19.5.1 General Stacking Procedure
      2. 19.5.2 Stacking and Cross-Validation
      3. 19.5.3 Discussions
    6. 19.6 Recent Advances in Ensemble Learning
    7. 19.7 Conclusions
    8. Bibliography
      1. Figure 19.1
      1. Table 19.1
      2. Table 19.2
      3. Table 19.3
      4. Table 19.4
      5. Table 19.5
      6. Table 19.6
      7. Table 19.7
      8. Table 19.8
      9. Table 19.9
      10. Table 19.10
      11. Table 19.11
      12. Table 19.12
      13. Table 19.13
      14. Table 19.14
      15. Table 19.15
  26. Chapter 20 Semi-Supervised Learning
    1. 20.1 Introduction
      1. 20.1.1 Transductive vs. Inductive Semi-Supervised Learning
      2. 20.1.2 Semi-Supervised Learning Framework and Assumptions
    2. 20.2 Generative Models
      1. 20.2.1 Algorithms
      2. 20.2.2 Description of a Representative Algorithm
      3. 20.2.3 Theoretical Justification and Relevant Results
    3. 20.3 Co-Training
      1. 20.3.1 Algorithms
      2. 20.3.2 Description of a Representative Algorithm
      3. 20.3.3 Theoretical Justification and Relevant Results
    4. 20.4 Graph-Based Methods
      1. 20.4.1 Algorithms
        1. 20.4.1.1 Graph Cut
        2. 20.4.1.2 Graph Transduction
        3. 20.4.1.3 Manifold Regularization
        4. 20.4.1.4 Random Walk
        5. 20.4.1.5 Large Scale Learning
      2. 20.4.2 Description of a Representative Algorithm
      3. 20.4.3 Theoretical Justification and Relevant Results
    5. 20.5 Semi-Supervised Learning Methods Based on Cluster Assumption
      1. 20.5.1 Algorithms
      2. 20.5.2 Description of a Representative Algorithm
      3. 20.5.3 Theoretical Justification and Relevant Results
    6. 20.6 Related Areas
    7. 20.7 Concluding Remarks
    8. Bibliography
      1. Figure 20.1
      2. Figure 20.2
  27. Chapter 21 Transfer Learning
    1. 21.1 Introduction
    2. 21.2 Transfer Learning Overview
      1. 21.2.1 Background
      2. 21.2.2 Notations and Definitions
    3. 21.3 Homogenous Transfer Learning
      1. 21.3.1 Instance-Based Approach
        1. 21.3.1.1 Case I: No Target Labeled Data
        2. 21.3.1.2 Case II: A Few Target Labeled Data
      2. 21.3.2 Feature-Representation-Based Approach
        1. 21.3.2.1 Encoding Specific Knowledge for Feature Learning
        2. 21.3.2.2 Learning Features by Minimizing Distance between Distributions
        3. 21.3.2.3 Learning Features Inspired by Multi-Task Learning
        4. 21.3.2.4 Learning Features Inspired by Self-Taught Learning
        5. 21.3.2.5 Other Feature Learning Approaches
      3. 21.3.3 Model-Parameter-Based Approach
      4. 21.3.4 Relational-Information-Based Approaches
    4. 21.4 Heterogeneous Transfer Learning
      1. 21.4.1 Heterogeneous Feature Spaces
      2. 21.4.2 Different Label Spaces
    5. 21.5 Transfer Bounds and Negative Transfer
    6. 21.6 Other Research Issues
      1. 21.6.1 Binary Classification vs. Multi-Class Classification
      2. 21.6.2 Knowledge Transfer from Multiple Source Domains
      3. 21.6.3 Transfer Learning Meets Active Learning
    7. 21.7 Applications of Transfer Learning
      1. 21.7.1 NLP Applications
      2. 21.7.2 Web-Based Applications
      3. 21.7.3 Sensor-Based Applications
      4. 21.7.4 Applications to Computer Vision
      5. 21.7.5 Applications to Bioinformatics
      6. 21.7.6 Other Applications
    8. 21.8 Concluding Remarks
    9. Bibliography
      1. Figure 21.1
      1. Table 21.1
      2. Table 21.2
      3. Table 21.3
  28. Chapter 22 Active Learning: A Survey
    1. 22.1 Introduction
    2. 22.2 Motivation and Comparisons to Other Strategies
      1. 22.2.1 Comparison with Other Forms of Human Feedback
      2. 22.2.2 Comparisons with Semi-Supervised and Transfer Learning
    3. 22.3 Querying Strategies
      1. 22.3.1 Heterogeneity-Based Models
        1. 22.3.1.1 Uncertainty Sampling
        2. 22.3.1.2 Query-by-Committee
        3. 22.3.1.3 Expected Model Change
      2. 22.3.2 Performance-Based Models
        1. 22.3.2.1 Expected Error Reduction
        2. 22.3.2.2 Expected Variance Reduction
      3. 22.3.3 Representativeness-Based Models
      4. 22.3.4 Hybrid Models
    4. 22.4 Active Learning with Theoretical Guarantees
      1. 22.4.1 A Simple Example
      2. 22.4.2 Existing Works
      3. 22.4.3 Preliminaries
      4. 22.4.4 Importance Weighted Active Learning
        1. 22.4.4.1 Algorithm
        2. 22.4.4.2 Consistency
        3. 22.4.4.3 Label Complexity
    5. 22.5 Dependency-Oriented Data Types for Active Learning
      1. 22.5.1 Active Learning in Sequences
      2. 22.5.2 Active Learning in Graphs
        1. 22.5.2.1 Classification of Many Small Graphs
        2. 22.5.2.2 Node Classification in a Single Large Graph
    6. 22.6 Advanced Methods
      1. 22.6.1 Active Learning of Features
      2. 22.6.2 Active Learning of Kernels
      3. 22.6.3 Active Learning of Classes
      4. 22.6.4 Streaming Active Learning
      5. 22.6.5 Multi-Instance Active Learning
      6. 22.6.6 Multi-Label Active Learning
      7. 22.6.7 Multi-Task Active Learning
      8. 22.6.8 Multi-View Active Learning
      9. 22.6.9 Multi-Oracle Active Learning
      10. 22.6.10 Multi-Objective Active Learning
      11. 22.6.11 Variable Labeling Costs
      12. 22.6.12 Active Transfer Learning
      13. 22.6.13 Active Reinforcement Learning
    7. 22.7 Conclusions
    8. Bibliography
      1. Figure 22.1
      2. Figure 22.2
  29. Chapter 23 Visual Classification
    1. 23.1 Introduction
      1. 23.1.1 Requirements for Visual Classification
      2. 23.1.2 Visualization Metaphors
        1. 23.1.2.1 2D and 3D Spaces
        2. 23.1.2.2 More Complex Metaphors
      3. 23.1.3 Challenges in Visual Classification
      4. 23.1.4 Related Works
    2. 23.2 Approaches
      1. 23.2.1 Nomograms
        1. 23.2.1.1 Naïve Bayes Nomogram
      2. 23.2.2 Parallel Coordinates
        1. 23.2.2.1 Edge Cluttering
      3. 23.2.3 Radial Visualizations
        1. 23.2.3.1 Star Coordinates
      4. 23.2.4 Scatter Plots
        1. 23.2.4.1 Clustering
        2. 23.2.4.2 Naïve Bayes Classification
      5. 23.2.5 Topological Maps
        1. 23.2.5.1 Self-Organizing Maps
        2. 23.2.5.2 Generative Topographic Mapping
      6. 23.2.6 Trees
        1. 23.2.6.1 Decision Trees
        2. 23.2.6.2 Treemap
        3. 23.2.6.3 Hyperbolic Tree
        4. 23.2.6.4 Phylogenetic Trees
    3. 23.3 Systems
      1. 23.3.1 EnsembleMatrix and ManiMatrix
      2. 23.3.2 Systematic Mapping
      3. 23.3.3 iVisClassifier
      4. 23.3.4 ParallelTopics
      5. 23.3.5 VisBricks
      6. 23.3.6 WHIDE
      7. 23.3.7 Text Document Retrieval
    4. 23.4 Summary and Conclusions
    5. Bibliography
      1. Figure 23.1
      2. Figure 23.2
      3. Figure 23.3
      4. Figure 23.4
      5. Figure 23.5
      6. Figure 23.6
      7. Figure 23.7
  30. Chapter 24 Evaluation of Classification Methods
    1. 24.1 Introduction
    2. 24.2 Validation Schemes
    3. 24.3 Evaluation Measures
      1. 24.3.1 Accuracy Related Measures
        1. 24.3.1.1 Discrete Classifiers
        2. 24.3.1.2 Probabilistic Classifiers
      2. 24.3.2 Additional Measures
    4. 24.4 Comparing Classifiers
      1. 24.4.1 Parametric Statistical Comparisons
        1. 24.4.1.1 Pairwise Comparisons
        2. 24.4.1.2 Multiple Comparisons
      2. 24.4.2 Non-Parametric Statistical Comparisons
        1. 24.4.2.1 Pairwise Comparisons
        2. 24.4.2.2 Multiple Comparisons
        3. 24.4.2.3 Permutation Tests
    5. 24.5 Concluding Remarks
    6. Bibliography
      1. Figure 24.1
      2. Figure 24.2
      3. Figure 24.3
      4. Figure 24.4
      1. Table 24.1
      2. Table 24.2
      3. Table 24.3
      4. Table 24.4
      5. Table 24.5
  31. Chapter 25 Educational and Software Resources for Data Classification
    1. 25.1 Introduction
    2. 25.2 Educational Resources
      1. 25.2.1 Books on Data Classification
      2. 25.2.2 Popular Survey Papers on Data Classification
    3. 25.3 Software for Data Classification
      1. 25.3.1 Data Benchmarks for Software and Research
    4. 25.4 Summary
    5. Bibliography