Machine Learning Algorithms for Classification
Much like regression, there are problems where linear methods don’t work well for classification. This section describes some machine learning algorithms for classification problems.
k Nearest Neighbors
One of the simplest techniques for classification problems is k nearest neighbors. Here’s how the algorithm works:
The analyst specifies a “training” data set.
To predict the class of a new value, the algorithm looks for the k observations in the training set that are closest to the new value.
The prediction for the new value is the class of the “majority” of the k nearest neighbors.
To use k nearest neighbors in R, use the
function knn
in the class
package:
libary(class) knn(train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)
Here is the description of the arguments to the knn
function.
Argument | Description | Default |
---|---|---|
train | A matrix or data frame containing the training data. | |
test | A matrix or data frame containing the test data. | |
cl | A factor specifying the classification of observations in the training set. | |
k | A numeric value specifying the number of neighbors to consider. | 1 |
l | When k > 0, specifies the
minimum vote for a decision. (If there aren’t enough votes,
the value doubt is
returned.) | 0 |
prob | If prob=TRUE , then
the proportion of votes for the winning class is returned as
attribute prob . | FALSE |
use.all | Controls the handling of ties when selecting nearest
neighbors. If use.all=TRUE , then all distances
equal to the kth largest are included.
If use.all=FALSE ... |
Get R in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.