In this chapter, we explored the fundamental ideas surrounding issues and concerns with data quality and how to categorize quality issues by their type, as well as presented ideas for tidying up your data.
In order to compare the performance of the different models that one may create, we went on to establish some fundamental notions of model performance, such as the mean squared error (MSE) for regression and the classification error rate for classification.
We also introduced cross-validation as a generic assessment technique to be used in cases where there is a limited amount of data available.
Finally, learning curves were discussed as a way to judge the ability of a model to improve its scores or ability to learn.
With a firm grounding ...