Appendix B. RL4J and Reinforcement Learning

Preliminaries

We begin this appendix with an introduction to reinforcement learning, followed by a detailed explanation of Deep Q-Networks (DQNs) for pixel inputs, and then we conclude by showing you an RL4J example. Let’s begin with a look at the core concepts of reinforcement learning.

Reinforcement learning is an exciting area of machine learning. It is, basically, the learning of an efficient strategy in a given environment. Informally, this is very similar to Pavlovian conditioning: you assign a reward for a given behavior, and, over time, the agents learn to reproduce that behavior in order to receive more rewards.

Markov Decision Process

Formally, an environment is defined as a Markov Decision Process (MDP). Behind this scary name is nothing other than the combination of (5-tuple):

  • A set of states SS (e.g., in chess, a state is the board configuration)
  • A set of possible action AA (in chess, every possible move in every configuration possible; e.g., e4–e5).
  • The conditional distribution P(s′|s,a)P(s′|,a) of the next state, given a current state and an action. (In a deterministic environment like chess, there is only one state s′ with probability 1, and all the others with probability 0. Nevertheless, in a stochastic (involving randomness, like a a coin toss) environment, the distribution is not as simple.)
  • The reward function of transitioning from state s to s′: R(s,s′) (e.g., in chess, +1 for a final move that leads ...

Get Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.