Classification of the reviews

At the beginning of this section, we will try to classify the corpus using algorithms we have already discussed (Naïve Bayes and k-NN). We will then briefly discuss two new algorithms: logistic regression and support vector machines.

Document classification with k-NN

We know k-Nearest Neighbors, so we'll just jump into the classification. We will try with three neighbors and five neighbors:

1  library(class) # knn() is in the class packages
2  library(caret) # confusionMatrix is in the caret package
3  set.seed(975)
4  Class3n = knn(TrainDF[,-1], TrainDF[,-1], TrainDF[,1], k = 3)
5  Class5n = knn(TrainDF[,-1], TrainDF[,-1], TrainDF[,1], k = 5)
6  confusionMatrix(Class3n,as.factor(TrainDF$quality))

The confusion matrix and the ...

Get R: Predictive Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

R: Predictive Analysis by Tony Fischetti, Eric Mayor, Rui Miguel Forte

Classification of the reviews

Document classification with k-NN

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly