January 2020
Intermediate to advanced
432 pages
10h 18m
English
Using advantage and training multiple networks to work together as you may imagine is not trivial. Therefore, we want to focus a whole exercise on understanding how training works in AC. Open up Chapter_8_ActorCritic.py again and follow the exercise:
s, a, r, s_prime, done = self.make_batch()td_target = r + gamma * self.v(s_prime) * done
Read now
Unlock full access