May 2019
Intermediate to advanced
664 pages
15h 41m
English
I'm going to provide limited commentary during this portion as we've done this before in Chapter 4, Advanced Feature Selection in Linear Models. We will create our model using LASSO and check the performance on the test data. Let's specify our x and y for the cv.glmnet() function:
> x <- dtm_train_tfidf> y <- as.factor(train$party)
The minimum number of folds in cross-validation with glmnet is three, which we will use given the small number of observations:
> set.seed(123)> lasso <- glmnet::cv.glmnet( x, y, nfolds = 3, type.measure = "class", alpha = 1, family = "binomial" )> plot(lasso)
The output of the preceding code is as follows:
Wow! All those input features and just a handful are relevant, and the area under the curve ...