Classification of the reviews
At the beginning of this section, we will try to classify the corpus using algorithms we have already discussed (Naïve Bayes and k-NN). We will then briefly discuss two new algorithms: logistic regression and support vector machines.
Document classification with k-NN
We know k-Nearest Neighbors, so we'll just jump into the classification. We will try with three neighbors and five neighbors:
1 library(class) # knn() is in the class packages 2 library(caret) # confusionMatrix is in the caret package 3 set.seed(975) 4 Class3n = knn(TrainDF[,-1], TrainDF[,-1], TrainDF[,1], k = 3) 5 Class5n = knn(TrainDF[,-1], TrainDF[,-1], TrainDF[,1], k = 5) 6 confusionMatrix(Class3n,as.factor(TrainDF$quality))
The confusion matrix and the ...
Get R: Predictive Analysis now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.