March 2017
Beginner to intermediate
866 pages
18h 4m
English
Any predictive model needs to be validated to see how it is performing on different sets of data, whether the accuracy of the model is constant over all the sources of similar data or not. This checks the problem of over-fitting, wherein the model fits very well on one set of data but doesn't fit that well on another dataset. One common method is to validate a model train-test split of the dataset. Another method is k-fold cross validation, about which we will learn more in the later chapter.
Ideally, this step should be done right at the onset of the modelling process so that there are no sampling biases in the model; in other words, the model should perform well even for a dataset that has the same ...