Chapter 9. Reinforcement Learning

Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm of learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning.1

DeepMind (2016)

The learning algorithms applied in Chapters 7 and 8 fall into the category of supervised learning. These methods require that there is a data set available with features and labels that allows the algorithms to learn relationships between the features and labels to succeed at estimation or classification tasks. As the simple example in Chapter 1 illustrates, reinforcement learning (RL) works differently. To begin with, there is no need for a comprehensive data set of features and labels to be given up front. The data is rather generated by the learning agent while interacting with the environment of interest. This chapter covers RL in some detail and introduces fundamental notions, as well as one of the most popular algorithms used in the field: Q-learning (QL). Neural networks are not replaced by RL algorithms; they generally play an important role in this context as well.

“Fundamental Notions” explains fundamental notions in RL, such as environments, states, and agents. “OpenAI Gym” introduces the OpenAI Gym suite of RL environments of which the CartPole environment is used as an example. In this environment, which Chapter 2 introduces and discusses briefly, agents must learn how to balance ...

Get Artificial Intelligence in Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.