Let's understand A3C with a mountain car example. Our agent is the car and it is placed between two mountains. The goal of our agent is to drive up the mountain on the right. However, the car can't drive up the mountain in one pass; it has to drive up back and forth to build the momentum. A high reward will be assigned if our agent spends less energy on driving up. Credits for the code used in this section goes to Stefan Boschenriedter (https://github.com/stefanbo92/A3C-Continuous). The environment is shown as follows:
Okay, let's get to the coding! The complete code is available as the Jupyter notebook with ...