Here is how we apply text2vec:
- Load the required packages and dataset:
library(text2vec) library(glmnet) data("movie_review")
- Function to perform Lasso logistic regression, and return the train and test AUC values:
logistic_model <- function(Xtrain,Ytrain,Xtest,Ytest){ classifier <- cv.glmnet(x=Xtrain, y=Ytrain, family="binomial", alpha=1, type.measure = "auc", nfolds = 5, maxit = 1000) plot(classifier) vocab_test_pred <- predict(classifier, Xtest, type = "response") return(cat("Train AUC : ", round(max(classifier$cvm), 4), "Test AUC : ",glmnet:::auc(Ytest, vocab_test_pred),"\n")) }
- Split the movies review data into train and test in an 80:20 ratio:
train_samples <- caret::createDataPartition(c(1:length(labels[1,1])),p ...