It should come as no surprise that it is quite easy for us to get ourselves into situations where training can go wrong. Unfortunately, these situations are not the kind with big explosions, but the kind where our agent will not progressively learn or improve. This typically happens for the following reasons:
- Reward is wrong (sparse rewards): Generally, you want to stay within a range of -1.0 to +1.0 and have readily available rewards.
- Observations are wrong: Too many or too few observations can be a problem, depending on the model.
- Hyper parameters: This encompasses many parameters, and not understanding how to adjust these can lead to frustration. We will, of course, spend some time learning how to properly adjust ...