Appendix A. AI Performance Metrics
This section contains all relevant quantitative performance metrics, including those for generative AI (GenAI) on Microsoft Azure at both model and system level.
AI Model Performance
These metrics are directly related to AI models (see Table A-1), including classification and regression tasks for machine learning and deep learning, as well as other specific metrics for traditional and modern language models. They are the baseline to evaluate the quantitative performance during both preliminary testing and postproduction maintenance stages.
| AI model type | Metric | Range of values | Purpose |
|---|---|---|---|
|
Classification |
AUC-ROC (area under curve) |
0 to 1 (higher is better) |
Measures how well a model distinguishes between classes |
|
Precision |
0 to 1 (higher is better) |
Measures the proportion of correctly identified positive results out of total predicted positives |
|
|
Recall |
0 to 1 (higher is better) |
Measures the proportion of actual positives correctly identified |
|
|
F1 score (based on precision and recall) |
0 to 1 (higher is better) |
Balances precision and recall for imbalanced datasets |
|
|
F2 score (based on precision and recall) |
0 to 1 (higher is better) |
A weighted average of precision and recall, giving more importance to recall and to catch true positives |
|
|
Regression |
MAE (mean absolute error) |
0 to ∞ (lower is better) |
Measures average absolute error between predicted and actual values ... |
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access