Chapter 2. Deep Q-Learning

Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm of learning by trial and error, solely from rewards or punishments, is known as reinforcement learning (RL).1

DeepMind (2016)

The previous chapter introduces deep Q-learning (DQL) as a major algorithm in AI that learns through interaction with an environment. This chapter provides some more details about the DQL algorithm. It uses the CartPole environment from the Gymnasium Python package to illustrate the API-based interaction with gaming environments. It also implements a DQL agent as a self-contained Python class that serves as a blueprint for later DQL agents applied to financial environments.

However, before the focus is turned on DQL, the chapter discusses general decision problems in economics and finance. Dynamic programming is introduced as a solution mechanism for dynamic decision problems. This provides the background for the application of DQL algorithms because they can be considered to lead to approximate solutions to dynamic programming problems.

“Decision Problems” classifies decision problems in economics and finance according to different characteristics. “Dynamic Programming” focuses on a special type of decision problem: so-called finite horizon Markovian dynamic programming problems. “Q-Learning” outlines the major elements of Q-learning and explains the role of deep neural networks in this context. ...

Get Reinforcement Learning for Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.