March 2026
Intermediate to advanced
402 pages
11h 1m
English
In the previous chapter, you were introduced to reinforcement learning from human feedback (RLHF) and how it can be used for aligning language models, and how language model fine-tuning works. In this chapter, we will discuss some advanced topics and some recent trends in this domain. Building reward models based on human preferences can be challenging due to difficulties in collecting human annotations, making humans comply with providing correct feedback, and due to human biases. This can be mitigated by using a diversified pool of human labelers. While this can be feasible, the process of sourcing human feedback becomes expensive. In this chapter, we will discuss some alternative ...
Read now
Unlock full access