11
Model Evaluation
Model evaluation is the phase where we assess the general performance of models beyond a single downstream task. For language models, it measures the model's intrinsic capabilities as reflected through standardized evaluation protocols, across various linguistic, statistical, and generative dimensions. Unlike traditional supervised learning, where clear ground truth labels are available for evaluation, reinforcement learning from human feedback (RLHF) and other human alignment methods, such as direct preference optimization (DPO) and reinforcement learning from AI feedback (RLAIF), rely on human judgments, preference comparisons, and often qualitative assessments. In practice, the reward models used in RLHF are typically ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access