Training DDPG

Now, as you may have noticed in the last example, Chapter_8_DDPG.py is using four networks/models to train, using two networks as actors and two as critics, but also using two networks as targets and two as current. This gives us the following diagram:

Diagram of actor-critic target-current networks

Each oval in the preceding diagram represents a complete deep learning network. Notice how the critic, the value or Q network implementation, is taking both environment outputs reward and state. The critic then pushes a value back to the actor or policy target network.

Open example Chapter_8_DDPG.py back up and follow the next exercise ...

Get Hands-On Reinforcement Learning for Games now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.