book

Evaluating Machine Learning Models

Name: Evaluating Machine Learning Models
Author: Alice Zheng
ISBN: 9781491932445

by Alice Zheng

September 2015

Intermediate to advanced

20 pages

1h 20m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
1. Orientation
The Machine Learning WorkflowEvaluation MetricsOffline Evaluation MechanismsHyperparameter SearchOnline Testing Mechanisms
2. Evaluation Metrics
Classification MetricsAccuracyConfusion MatrixPer-Class AccuracyLog-LossAUCRanking MetricsPrecision-RecallPrecision-Recall Curve and the F1 ScoreNDCGRegression MetricsRMSEQuantiles of Errors“Almost Correct” PredictionsCaution: The Difference Between Training Metrics and Evaluation MetricsCaution: Skewed Datasets—Imbalanced Classes, Outliers, and Rare DataRelated ReadingSoftware Packages
3. Offline Evaluation Mechanisms: Hold-Out Validation, Cross-Validation, and Bootstrapping
Unpacking the Prototyping Phase: Training, Validation, Model SelectionWhy Not Just Collect More Data?Hold-Out ValidationCross-ValidationBootstrap and JackknifeCaution: The Difference Between Model Validation and TestingSummaryRelated ReadingSoftware Packages
4. Hyperparameter Tuning
Model Parameters Versus HyperparametersWhat Do Hyperparameters Do?Hyperparameter Tuning MechanismHyperparameter Tuning AlgorithmsGrid SearchRandom SearchSmart Hyperparameter TuningThe Case for Nested Cross-ValidationRelated ReadingSoftware Packages
5. The Pitfalls of A/B Testing
A/B Testing: What Is It?Pitfalls of A/B Testing1. Complete Separation of Experiences2. Which Metric?3. How Much Change Counts as Real Change?4. One-Sided or Two-Sided Test?5. How Many False Positives Are You Willing to Tolerate?6. How Many Observations Do You Need?7. Is the Distribution of the Metric Gaussian?8. Are the Variances Equal?9. What Does the p-Value Mean?10. Multiple Models, Multiple Hypotheses11. How Long to Run the Test?12. Catching Distribution DriftMulti-Armed Bandits: An AlternativeRelated ReadingThat’s All, Folks!

Content preview from Evaluating Machine Learning Models

Chapter 2. Evaluation Metrics

Evaluation metrics are tied to machine learning tasks. There are different metrics for the tasks of classification, regression, ranking, clustering, topic modeling, etc. Some metrics, such as precision-recall, are useful for multiple tasks. Classification, regression, and ranking are examples of supervised learning, which constitutes a majority of machine learning applications. We’ll focus on metrics for supervised learning models in this report.

Classification Metrics

Classification is about predicting class labels given input data. In binary classification, there are two possible output classes. In multiclass classification, there are more than two possible classes. I’ll focus on binary classification here. But all of the metrics can be extended to the multiclass scenario.

An example of binary classification is spam detection, where the input data could include the email text and metadata (sender, sending time), and the output label is either “spam” or “not spam.” (See Figure 2-1.) Sometimes, people use generic names for the two classes: “positive” and “negative,” or “class 1” and “class 0.”

There are many ways of measuring classification performance. Accuracy, confusion matrix, log-loss, and AUC are some of the most popular metrics. Precision-recall is also widely used; I’ll explain it in “Ranking Metrics”.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Machine Learning with TensorFlow, Second Edition

Publisher Resources

ISBN: 9781492048756Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Evaluating Machine Learning Models

by Alice Zheng

Chapter 2. Evaluation Metrics

Classification Metrics

Figure 2-1. Email spam detection is ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.