Hindsight experience replay was introduced by OpenAI as a method to deal with sparse rewards, but the algorithm has also been shown to successfully generalize across tasks due in part to the novel mechanism by which HER works. The analogy used to explain HER is a game of shuffleboard, the object of which is to slide a disc down a long table to reach a goal target. When first learning the game, we will often repeatedly fail, with the disc falling off the table or playing area. Except, it is presumed that we learn by expecting to fail and give ourselves a reward when we do so. Then, internally, we can work backward by reducing our failure reward and thereby increasing other non-failure rewards. In some ways, ...
Using hindsight experience replay
Get Hands-On Reinforcement Learning for Games now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.