Bias and variance
Overfitting is a problem that occurs with machine learning algorithms that are able to generate very accurate results on a training dataset but fail to generalize very well from what they've learned. We say that models which have overfit the data have very high variance. When we trained our decision tree on data that included the numeric age of passengers, we were overfitting the data.
Conversely, certain models may have very high bias. This is a situation where the model has a strong tendency towards a certain outcome irrespective of the training examples to the contrary. Recall our example of a classifier that always predicts that a survivor will perish. This classifier would perform well on dataset with low survivor rates, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access