Chapter 2Single-Agent Reinforcement Learning
The objective of this chapter is to introduce the reader to reinforcement learning. A good introductory book on the topic is Reference [1] and we will follow their notation. The goal of reinforcement learning is to maximize a reward. The interesting aspect of reinforcement learning, as well as unsupervised learning methods, is the choice of rewards. In this chapter, we will discuss some of the fundamental ideas in reinforcement learning which we will refer to in the rest of the book. We will start with the simple -armed bandit problem and then present ideas on the meaning of the “value” function.
2.1 Introduction
Reinforcement learning is learning to map situations to actions so as to maximize a numerical reward [1]. Without knowing which actions to take, the learner must discover which actions yield the most reward by trying them. Actions may affect not only the immediate reward but also the next situation and all subsequent rewards [1]. Different from supervised learning, which is learning from examples provided by a knowledgeable external supervisor, reinforcement learning is used for learning from interaction [1]. Since it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations, the learner must be able to learn from its own experience [1]. Therefore, the reinforcement ...
Get Multi-Agent Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.