Logistic regression with cross-validation

The purpose of cross-validation is to improve our prediction of the test set and minimize the chance of overfitting. With the K-fold cross-validation, the dataset is split into K equal-sized parts. The algorithm learns by alternatively holding out one of the K-sets; it fits a model to the other K-1 parts, and obtains predictions for the left-out K-set. The results are then averaged so as to minimize the errors, and appropriate features are selected. You can also perform the Leave-One-Out-Cross-Validation (LOOCV) method, where K is equal to 1. Simulations have shown that the LOOCV method can have averaged estimates that have a high variance. As a result, most machine learning experts will recommend ...

Get Mastering Machine Learning with R - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.