Hindsight experience replay

One of the characteristics of humans is to learn from their mistakes and adapt to them to avoid making the same mistake. This is the secret of RL algorithms. Problems in the implementation of these algorithms are encountered when scattered prizes are encountered. Let's analyze the following scenario: an agent must manage a robot arm to open a box and place an object inside it. The reward for this task is simple to define; on the contrary, the learning problem is difficult to implement. The agent must explore a long sequence of correct actions to identify an environment configuration that returns the scattered reward: in this case, the position of the object inside the box. Finding this reward signal is a complicated ...

