March 2026
Intermediate to advanced
402 pages
11h 1m
English
In this chapter, our focus shifts towards the role of human feedback in reinforcement learning (RL), taking a broader perspective on it. In the previous chapter, we illustrated how RL with human feedback (RLHF) can significantly improve and expedite policy training compared to traditional RL methods such as Q-learning. The incorporation of human feedback in RL has evolved across various applications in both practical settings and research literature, experiencing a surge in popularity in recent years, especially in large language models. While our current focus remains broad, we won't delve into specific applications of human feedback in large language modeling just yet. Instead, our aim is ...
Read now
Unlock full access