IN THIS CHAPTER
Explaining how correct sampling is critical in machine learning
Highlighting errors dictated by bias and variance
Proposing different approaches to validation and testing
Warning against biased samples, overfitting, underfitting, and snooping
“I’m not running around looking for love and validation …”
— SOPHIE B. HAWKINS
Having examples (in the form of data sets) and a machine learning algorithm at hand doesn’t assure that solving a learning problem is possible or that the results will provide the desired solution. For example, if you want your computer to distinguish a photo of a dog from a photo of a cat, you can provide it with good examples of dogs and cats. You then train a dog versus cat classifier based on some machine learning algorithm that could output the probability that a given photo is a dog or a cat. Of course, the output is a probability — not an absolute assurance that the photo is a dog or cat.
Based on the probability that the classifier reports, you can decide the class (dog or cat) of a photo based on the ...