Chapter 9. Reinforcement Learning

Incentives drive nearly everything, and finance is not an exception. Humans do not learn from millions of labeled examples. Instead, we often learn from positive or negative experiences that we associate with our actions. Learning from experiences and the associated rewards or punishments is the core idea behind reinforcement learning (RL).1

Reinforcement learning is an approach toward training a machine to find the best course of action through optimal policies that maximize rewards and minimize punishments.

The RL algorithms that empowered AlphaGo (the first computer program to defeat a professional human Go player) are also finding inroads into finance. Reinforcement learning’s main idea of maximizing the rewards aligns beautifully with several areas in finance, including algorithmic trading and portfolio management. Reinforcement learning is particularly suitable for algorithmic trading, because the concept of a return-maximizing agent in an uncertain, dynamic environment has much in common with an investor or a trading strategy that interacts with financial markets. Reinforcement learning–based models go one step further than the price prediction–based trading strategies discussed in previous chapters and determine rule-based policies for actions (i.e., place an order, do nothing, cancel an order, and so on).

Similarly, in portfolio management and asset allocation, reinforcement learning–based algorithms do not yield predictions and do not ...

Get Machine Learning and Data Science Blueprints for Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.