Dealing with reducible error components

High bias:

  • Add more features
  • Apply a more complex model
  • Use less instances to train
  • Reduce regularization

High variance:

  • Conduct feature selection and use less features
  • Get more training data
  • Use regularization to help overcome the issues due to complex models

Cross validation

Cross-validation is an important step in the model validation and evaluation process. It is a technique to validate the performance of a model before we apply it on an unobserved dataset. It is not advised to use the full training data to train the model, because in such a case we would have no idea how the model is going to perform in practice. As we learnt in the previous section, a good learner should be able to generalize well on an unseen ...

Get Mastering Text Mining with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.