6 Improving agents’ behaviors

In this chapter

  • You will learn about improving policies when learning from feedback that is simultaneously sequential and evaluative.
  • You will develop algorithms for finding optimal policies in reinforcement learning environments when the transition and reward functions are unknown.
  • You will write code for agents that can go from random to optimal behavior using only their experiences and decision making, and train the agents in a variety of environments.

When it is obvious that the goals cannot be reached, don’t adjust the goals, adjust the action steps.

— Confucius Chinese teacher, editor, politician, and philosopher of the Spring and Autumn period of Chinese history

Up until this chapter, you’ve ...

