O'Reilly logo

Deep Learning by Adam Gibson, Josh Patterson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Appendix B. RL4J and Reinforcement Learning

Preliminaries

We begin this appendix with an introduction to reinforcement learning, followed by a detailed explanation of Deep Q-Networks (DQNs) for pixel inputs, and then we conclude by showing you an RL4J example. Let’s begin with a look at the core concepts of reinforcement learning.

Reinforcement learning is an exciting area of machine learning. It is, basically, the learning of an efficient strategy in a given environment. Informally, this is very similar to Pavlovian conditioning: you assign a reward for a given behavior, and, over time, the agents learn to reproduce that behavior in order to receive more rewards.

Markov Decision Process

Formally, an environment is defined as a Markov Decision Process (MDP). Behind this scary name is nothing other than the combination of (5-tuple):

  • A set of states SS (e.g., in chess, a state is the board configuration)
  • A set of possible action AA (in chess, every possible move in every configuration possible; e.g., e4–e5).
  • The conditional distribution P(s′|s,a)P(s′|,a) of the next state, given a current state and an action. (In a deterministic environment like chess, there is only one state s′ with probability 1, and all the others with probability 0. Nevertheless, in a stochastic (involving randomness, like a a coin toss) environment, the distribution is not as simple.)
  • The reward function of transitionning from state s to s′: R(s,s′) (e.g., in chess, +1 for a final move that ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required