book

Evaluating Machine Learning Models

by Alice Zheng

September 2015

Intermediate to advanced

20 pages

1h 20m

English

O'Reilly Media, Inc.

Read now

Unlock full access

The Machine Learning WorkflowEvaluation MetricsOffline Evaluation MechanismsHyperparameter SearchOnline Testing Mechanisms
Classification MetricsAccuracyConfusion MatrixPer-Class AccuracyLog-LossAUCRanking MetricsPrecision-RecallPrecision-Recall Curve and the F1 ScoreNDCGRegression MetricsRMSEQuantiles of Errors“Almost Correct” PredictionsCaution: The Difference Between Training Metrics and Evaluation MetricsCaution: Skewed Datasets—Imbalanced Classes, Outliers, and Rare DataRelated ReadingSoftware Packages
Unpacking the Prototyping Phase: Training, Validation, Model SelectionWhy Not Just Collect More Data?Hold-Out ValidationCross-ValidationBootstrap and JackknifeCaution: The Difference Between Model Validation and TestingSummaryRelated ReadingSoftware Packages
Model Parameters Versus HyperparametersWhat Do Hyperparameters Do?Hyperparameter Tuning MechanismHyperparameter Tuning AlgorithmsGrid SearchRandom SearchSmart Hyperparameter TuningThe Case for Nested Cross-ValidationRelated ReadingSoftware Packages
A/B Testing: What Is It?Pitfalls of A/B Testing1. Complete Separation of Experiences2. Which Metric?3. How Much Change Counts as Real Change?4. One-Sided or Two-Sided Test?5. How Many False Positives Are You Willing to Tolerate?6. How Many Observations Do You Need?7. Is the Distribution of the Metric Gaussian?8. Are the Variances Equal?9. What Does the p-Value Mean?10. Multiple Models, Multiple Hypotheses11. How Long to Run the Test?12. Catching Distribution DriftMulti-Armed Bandits: An AlternativeRelated ReadingThat’s All, Folks!

Content preview from Evaluating Machine Learning Models

Preface

This report on evaluating machine learning models arose out of a sense of need. The content was first published as a series of six technical posts on the Dato Machine Learning Blog. I was the editor of the blog, and I needed something to publish for the next day. Dato builds machine learning tools that help users build intelligent data products. In our conversations with the community, we sometimes ran into a confusion in terminology. For example, people would ask for cross-validation as a feature, when what they really meant was hyperparameter tuning, a feature we already had. So I thought, “Aha! I’ll just quickly explain what these concepts mean and point folks to the relevant sections in the user guide.”

So I sat down to write a blog post to explain cross-validation, hold-out datasets, and hyperparameter tuning. After the first two paragraphs, however, I realized that it would take a lot more than a single blog post. The three terms sit at different depths in the concept hierarchy of machine learning model evaluation. Cross-validation and hold-out validation are ways of chopping up a dataset in order to measure the model’s performance on “unseen” data. Hyperparameter tuning, on the other hand, is a more “meta” process of model selection. But why does the model need “unseen” data, and what’s meta about hyperparameters? In order to explain all of that, I needed to start from the basics. First, I needed to explain the high-level concepts and how they fit together. Only ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Machine Learning with TensorFlow, Second Edition

Publisher Resources

ISBN: 9781492048756Errata Page

Evaluating Machine Learning Models

by Alice Zheng

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Machine Learning with TensorFlow, Second Edition

Machine Learning Pocket Reference

Automated Deep Learning Using Neural Network Intelligence: Develop and Design PyTorch and TensorFlow Models Using Python

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Publisher Resources

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Machine Learning with TensorFlow, Second Edition

Machine Learning Pocket Reference

Automated Deep Learning Using Neural Network Intelligence: Develop and Design PyTorch and TensorFlow Models Using Python

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.