We have seen how an agent's policy can return either a single action or a probability distribution over a set of possible actions and how its value function can return how desirable a certain state is. But how can a model explain how it arrived at such predictions? As reinforcement learning becomes more popular and potentially more prevalent in real-life applications, there will be an ever-increasing need to be able to explain the output of reinforcement learning algorithms.

Today, most advanced reinforcement learning algorithms incorporate deep neural networks, which, as of now, can only be represented as a set of weights and a sequence of non-linear functions. Moreover, due to its high dimensional nature, neural ...

Get Python Reinforcement Learning Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.