6 Improving agents’ behaviors

In this chapter

You will learn about improving policies when learning from feedback that is simultaneously sequential and evaluative.
You will develop algorithms for finding optimal policies in reinforcement learning environments when the transition and reward functions are unknown.
You will write code for agents that can go from random to optimal behavior using only their experiences and decision making, and train the agents in a variety of environments.

When it is obvious that the goals cannot be reached, don’t adjust the goals, adjust the action steps.

— Confucius Chinese teacher, editor, politician, and philosopher of the Spring and Autumn period of Chinese history

Up until this chapter, you’ve studied ...

Get Grokking Deep Reinforcement Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Grokking Deep Reinforcement Learning by Miguel Morales