In the preceding example, we set the penalty parameter to 0.1. We could just as well have set it to 0.7 or 23.9. Naturally, the results will vary each time. If we pick an overly large value, we get underfitting. In an extreme case, the learning system will just return every coefficient equal to zero. If we pick a value that is too small, we are very close to OLS, which overfits and generalizes poorly (as we saw earlier).
How do we choose a good value? This is a general problem in machine learning: setting parameters for our learning methods. A generic solution is to use cross-validation. We pick a set of possible values, and then use cross-validation to choose which one is best. This performs more ...