Chapter 13

Feature Selection and Evaluation

Selecting the right features for classification is a major task in all areas of pattern matching and machine learning. This is a very difficult problem. In practice, adding a new feature to an existing feature vector may increase or decrease performance depending on the features already present. The search for the perfect vector is an NP-complete problem. In this chapter, we will discuss some common techniques that can be adopted with relative ease.

13.1 Overfitting and Underfitting

In order to get optimal classification accuracy, the model must have just the right level of complexity. Model complexity is determined by many factors, one of which is the dimensionality of the feature space. The more features we use, the more degrees of freedom we have to fit the model, and the more complex it becomes.

To understand what happens when a model is too complex or too simple, it is useful to study both the training error rate images/c13_I0001.gif and the testing error rate images/c13_I0002.gif. We are familiar with the testing error rate from Chapter 10, where we defined the accuracy as images/c13_I0003.gif, while the training error rate is obtained by testing the classifier on the training set. Obviously, ...

Get Machine Learning in Image Steganalysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.