To avoid any confusion, it is important to understand that A2C and A3C both use AC, but it is the fashion in which they update their models that differ. In A2C, the method is synchronous, so each brain is feeding thoughts into the main brain.
Let's see how this looks in code by opening the Chapter_9_A2C.py file and reviewing the hyperparameters inside it:
n_train_processes = 3learning_rate = 0.0002update_interval = 5gamma = 0.98max_train_steps = 60000PRINT_INTERVAL = update_interval * 100environment = "LunarLander-v2"
Keep the sample open and follow these steps to continue with this exercise:
- This is a large code example, so we will limit the sections we show here. The main thing of note here is the hyperparameters that are listed ...