Up to now we have been focusing on model-based and model-free methods. All the algorithms using these methods estimated the action values for a given current policy. In a second step, these estimated values were used to find a better policy by choosing the best action in a given state. These two steps were carried out in a loop again and again until no further improvement in values was observed. In this chapter, we will look at a different approach for learning optimal policies by directly operating in the policy space. We will improve the policies without explicitly learning ...
7. Policy Gradient Algorithms
Get Deep Reinforcement Learning with Python: With PyTorch, TensorFlow and OpenAI Gym now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.