Chapter 1. Orientation
Cross-validation, RMSE, and grid search walk into a bar. The bartender looks up and says, “Who the heck are you?”
That was my attempt at a joke. If you’ve spent any time trying to decipher machine learning jargon, then maybe that made you chuckle. Machine learning as a field is full of technical terms, making it difficult for beginners to get started. One might see things like “deep learning,” “the kernel trick,” “regularization,” “overfitting,” “semi-supervised learning,” “cross-validation,” etc. But what in the world do they mean?
One of the core tasks in building a machine learning model is to evaluate its performance. It’s fundamental, and it’s also really hard. My mentors in machine learning research taught me to ask these questions at the outset of any project: “How can I measure success for this project?” and “How would I know when I’ve succeeded?” These questions allow me to set my goals realistically, so that I know when to stop. Sometimes they prevent me from working on ill-formulated projects where good measurement is vague or infeasible. It’s important to think about evaluation up front.
So how would one measure the success of a machine learning model? How would we know when to stop and call it good? To answer these questions, let’s take a tour of the landscape of machine learning model evaluation.
The Machine Learning Workflow
There are multiple stages in developing a machine learning model for use in a software application. It follows that ...