Reinforcement learning solution methods

In this section, we will discuss in detail some of the methods to solve Reinforcement Learning problems. Specifically, dynamic programming (DP), Monte Carlo method, and temporal-difference (TD) learning. These methods address the problem of delayed rewards as well.

Dynamic Programming (DP)

DP is a set of algorithms that are used to compute optimal policies given a model of environment like Markov Decision Process. Dynamic programming models are both computationally expensive and assume perfect models; hence, they have low adoption or utility. Conceptually, DP is a basis for many algorithms or methods used in the following sections:

  1. Evaluating the policy: A policy can be assessed by computing the value function ...

Get Practical Machine Learning now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.