March 2026
Intermediate to advanced
402 pages
11h 1m
English
By now, you are familiar with the major standard approaches for aligning AI models to human preferences, mainly reinforcement learning from human feedback (RLHF), reinforcement learning from AI feedback (RLAIF), and Constitutional AI. We showed that building reward models is an essential step in RLHF and can lead to some limitations if the reward model is not trained well. In this chapter, we will explore a different class of methods introduced in recent times, collectively termed as direct alignment from preferences (DAP), a family of methods that optimize policies directly from preference comparisons, with direct preference optimization (DPO) being the main approach. These methods do not need ...
Read now
Unlock full access