Policy objective functions

Let's discuss now how to optimize a policy. In policy methods, our main objective is that a given policy  with parameter vector  finds the best values of the parameter vector. In order to measure which is the best, we measure  the quality of the policy  for different values of the parameter vector .

Before discussing the ...

Get Reinforcement Learning with TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.