We begin this appendix with an introduction to reinforcement learning, followed by a detailed explanation of Deep Q-Networks (DQNs) for pixel inputs, and then we conclude by showing you an RL4J example. Let’s begin with a look at the core concepts of reinforcement learning.
Reinforcement learning is an exciting area of machine learning. It is, basically, the learning of an efficient strategy in a given environment. Informally, this is very similar to Pavlovian conditioning: you assign a reward for a given behavior, and, over time, the agents learn to reproduce that behavior in order to receive more rewards.
Formally, an environment is defined as a Markov Decision Process (MDP). Behind this scary name is nothing other than the combination of (5-tuple):