
Chapter 3

  • What's a stochastic policy?
    • It's a policy defined in terms of a probability distribution
  • How can a return be defined in terms of the return at the next time step?
  • Why is the Bellman equation so important?
    • Because it provides a general formula to compute the value of a state using the current reward and the value of the subsequent state.
  • Which are the limiting factors of DP algorithms?
    • Due to a complexity explosion with the number of states, they have to be limited. The other constraint is that the dynamics of the system have to be fully known.
  • What's policy evaluation?
    • Is an iterative method to compute ...

Get Reinforcement Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.