O'Reilly logo

Hands-On Reinforcement Learning with Python by Sudharsan Ravichandiran

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning from human preference

Learning from human preference is a major breakthrough in RL. The algorithm was proposed by researchers at OpenAI and DeepMind. The idea behind the algorithm is to make the agent learn according to human feedback. Initially, the agents act randomly and then two video clips of the agent performing an action are given to a human. The human can inspect the video clips and tell the agent which video clip is better, that is, in which video the agent is performing the task better and will lead it to achieving the goal. Once this feedback is given, the agent will try to do the actions preferred by the human and set the reward accordingly. Designing reward functions is one of the major challenges in RL, so having human ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required