March 2026
Intermediate to advanced
402 pages
11h 1m
English
Throughout this book, we have explored Reinforcement Learning from Human Feedback (RLHF) and emerging AI methods using human feedback as essential frameworks for aligning language models with human intentions and preferences. From reward modeling to policy fine-tuning, our discussions centered on text, a domain where misalignment is visible in every sentence and word. But humans interact with the world through more than language alone: we perceive and value images, auditory signals and music, movements, and combinations of these in complex, nuanced ways. The same core idea that drives RLHF in text—iterative learning from explicit or implicit human signals—can align models in these other domains ...
Read now
Unlock full access