Machine Learning Algorithms for Classification
Much like regression, there are problems where linear methods don’t work well for classification. This section describes some machine learning algorithms for classification problems.
k Nearest Neighbors
One of the simplest techniques for classification problems is k nearest neighbors. Here’s how the algorithm works:
The analyst specifies a “training” data set.
To predict the class of a new value, the algorithm looks for the k observations in the training set that are closest to the new value.
The prediction for the new value is the class of the “majority” of the k nearest neighbors.
To use k nearest neighbors in R, use the
function knn in the class
package:
libary(class) knn(train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)
Here is the description of the arguments to the knn function.
| Argument | Description | Default |
|---|---|---|
| train | A matrix or data frame containing the training data. | |
| test | A matrix or data frame containing the test data. | |
| cl | A factor specifying the classification of observations in the training set. | |
| k | A numeric value specifying the number of neighbors to consider. | 1 |
| l | When k > 0, specifies the minimum
vote for a decision. (If there aren’t enough votes, the value
doubt is returned.) | 0 |
| prob | If prob=TRUE, then the
proportion of votes for the winning class is returned as
attribute prob. | FALSE |
| use.all | Controls the handling of ties when selecting nearest
neighbors. If use.all=TRUE,
then all distances equal to the kth largest
are included. If use.all=FALSE ... |