Chapter 3
- What's a stochastic policy?
- It's a policy defined in terms of a probability distribution
- How can a return be defined in terms of the return at the next time step?
- Why is the Bellman equation so important?
- Because it provides a general formula to compute the value of a state using the current reward and the value of the subsequent state.
- Which are the limiting factors of DP algorithms?
- Due to a complexity explosion with the number of states, they have to be limited. The other constraint is that the dynamics of the system have to be fully known.
- What's policy evaluation?
- Is an iterative method to compute ...