The goal of a reinforcement learning agent is to learn to perform a task well in an environment. Mathematically, this means to maximize the cumulative reward, R, which can be expressed in the following equation:
We are simply calculating a weighted sum of the reward received at each timestep.is called the discount factor, which is a scalar value between 0 and 1. The idea is that the later a reward comes, the less valuable it becomes. This reflects our perspectives on rewards as well; that we'd rather receive $100 now rather than a ...