Bias and variance

Overfitting is a problem that occurs with machine learning algorithms that are able to generate very accurate results on a training dataset but fail to generalize very well from what they've learned. We say that models which have overfit the data have very high variance. When we trained our decision tree on data that included the numeric age of passengers, we were overfitting the data.

Conversely, certain models may have very high bias. This is a situation where the model has a strong tendency towards a certain outcome irrespective of the training examples to the contrary. Recall our example of a classifier that always predicts that a survivor will perish. This classifier would perform well on dataset with low survivor rates, ...

Get Clojure for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.