Summary
In this chapter, we expanded upon the knowledge that we obtained about in Chapter 8, Reinforcement Learning, to learn about DDPG, HER, and how to combine these methods to create a reinforcement learning algorithm that independently controls a robotic arm.
The Deep Q network that we used to solve game challenges worked in discrete spaces; when building algorithms for more fluid motion tasks such as robots or self-driving cards, we need a class of algorithms that can handle continuous action spaces. For this, use policy gradient methods, which learn a policy from a set of actions directly. We can improve this learning by using an experience replay buffer, which stores positive past experiences so that they may be sampled during training ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access