6 Improving agents’ behaviors

In this chapter

  • You will learn about improving policies when learning from feedback that is simultaneously sequential and evaluative.
  • You will develop algorithms for finding optimal policies in reinforcement learning environments when the transition and reward functions are unknown.
  • You will write code for agents that can go from random to optimal behavior using only their experiences and decision making, and train the agents in a variety of environments.

When it is obvious that the goals cannot be reached, don’t adjust the goals, adjust the action steps.

— Confucius Chinese teacher, editor, politician, and philosopher of the Spring and Autumn period of Chinese history

Up until this chapter, you’ve ...

Get Grokking Deep Reinforcement Learning now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.