Chapter 4. Simulated Data

It is often said that data is the new oil, but this analogy is not quite right. Oil is a finite resource that must be extracted and refined, whereas data is an infinite resource that is constantly being generated and refined.

Halevy et al. (2009)

A major drawback of the financial environment as introduced in the previous chapter is that it relies by default on a single, historical financial time series. This is a too-limited data set with which to train a deep Q-learning (DQL) agent. It is like training an AI on a single game of chess and expecting it to perform well overall in chess.

This chapter introduces simulation-based approaches to augmenting the available data for the training of a DQL agent. The first approach, as introduced in “Noisy Time Series Data”, is to add random noise to a static financial time series. Although it is commonly agreed upon that financial time series data generally already contains noise—as compared to price movements or returns that are information induced—the idea is to train the agent on a large number of similar time series in the hope that it learns to distinguish information from noise.

The second approach, discussed in “Simulated Time Series Data”, is to generate financial time series data through simulation under certain constraints and assumptions. In general, a stochastic differential equation is assumed for the dynamics of the time series. The time series is then simulated given a discretization scheme and appropriate boundary conditions. This is one of the core numerical approaches used in computational finance to price financial derivatives or to manage financial risks, for example (see Glasserman [2004]).

Both data augmentation methods discussed in this chapter make it possible to generate an unlimited amount of training, validation, and test data for reinforcement learning.

Noisy Time Series Data

This section adjusts the first Finance environment from “Finance Environment” to add white noise, which is normally distributed data, to the original financial time series. First, add the helper class for the action space:

In [1]: class ActionSpace:
            def sample(self):
                return random.randint(0, 1)

The new NoisyData environment class only requires a few adjustments compared with the original Finance class. In the following Python code, two parameters are added to the initialization method:

In [2]: import numpy as np
        import pandas as pd
        from numpy.random import default_rng  1

In [3]: rng = default_rng(seed=100)  1

In [4]: class NoisyData:
            url = 'https://certificate.tpq.io/findata.csv'
            def __init__(self, symbol, feature, n_features=4,
                         min_accuracy=0.485, noise=True,
                         noise_std=0.001):
                self.symbol = symbol
                self.feature = feature
                self.n_features = n_features
                self.noise = noise  2
                self.noise_std = noise_std  3
                self.action_space = ActionSpace()
                self.min_accuracy = min_accuracy
                self._get_data()
                self._prepare_data()
            def _get_data(self):
                self.raw = pd.read_csv(self.url,
                        index_col=0, parse_dates=True)
1

The random number generator is imported and initialized.

2

The flag that specifies whether noise is added or not.

3

The noise level to be used when adjusting the data; it is to be given in % of the price level.

The following part of the Python class code is the most important one. It is where the noise is added to the original time series data:

In [5]: class NoisyData(NoisyData):
            def _prepare_data(self):
                self.data = pd.DataFrame(self.raw[self.symbol]).dropna()
                if self.noise:
                    std = self.data.mean() * self.noise_std  1
                    self.data[self.symbol] = (self.data[self.symbol] +
                        rng.normal(0, std, len(self.data)))  2
                self.data['r'] = np.log(self.data / self.data.shift(1))
                self.data['d'] = np.where(self.data['r'] > 0, 1, 0)
                self.data.dropna(inplace=True)
                ma, mi = self.data.max(), self.data.min()  3
                self.data_ = (self.data - mi) / (ma - mi)  3
            def reset(self):
                if self.noise:
                    self._prepare_data()  4
                self.bar = self.n_features
                self.treward = 0
                state = self.data_[self.feature].iloc[
                    self.bar - self.n_features:self.bar].values
                return state, {}
1

The standard deviation for the noise is calculated in absolute terms.

2

The white noise is added to the time series data.

3

The features data is normalized through min-max scaling.

4

A new noisy time series data set is generated.

Information Versus Noise

Generally, it is assumed that financial time series data includes a certain amount of noise already. Investopedia defines noise as follows: “Noise refers to information or activity that confuses or misrepresents genuine underlying trends.” In this section, we take the historical price series as given and actively add noise to it. The idea is that a DQL agent learns about the fundamental price and/or return trends embodied by the historical data set.

The final part of the Python class, the .step() method, can remain unchanged:

In [6]: class NoisyData(NoisyData):
            def step(self, action):
                if action == self.data['d'].iloc[self.bar]:
                    correct = True
                else:
                    correct = False
                reward = 1 if correct else 0
                self.treward += reward
                self.bar += 1
                self.accuracy = self.treward / (self.bar - self.n_features)
                if self.bar >= len(self.data):
                    done = True
                elif reward == 1:
                    done = False
                elif (self.accuracy < self.min_accuracy and
                      self.bar > self.n_features + 15):
                    done = True
                else:
                    done = False
                next_state = self.data_[self.feature].iloc[
                    self.bar - self.n_features:self.bar].values
                return next_state, reward, done, False, {}

Every time the financial environment is reset, a new time series is created by adding noise to the original time series. The following Python code illustrates this numerically:

In [7]: fin = NoisyData(symbol='EUR=', feature='EUR=',
                        noise=True, noise_std=0.005)

In [8]: fin.reset()  1
Out[8]: (array([0.79295659, 0.81097879, 0.78840972, 0.80597193]), {})

In [9]: fin.reset()  1
Out[9]: (array([0.80642276, 0.77840938, 0.80096369, 0.76938581]), {})

In [10]: fin = NoisyData('EUR=', 'r', n_features=4,
                         noise=True, noise_std=0.005)

In [11]: fin.reset()  2
Out[11]: (array([0.54198375, 0.30674865, 0.45688528, 0.52884033]), {})

In [12]: fin.reset()  2
Out[12]: (array([0.37967631, 0.40190291, 0.49196183, 0.47536065]), {})
1

Different initial states for the normalized price data

2

Different initial states for the normalized returns data

Finally, the following code visualizes several noisy time series data sets (see Figure 4-1):

In [13]: from pylab import plt, mpl
         plt.style.use('seaborn-v0_8')
         mpl.rcParams['figure.dpi'] = 300
         mpl.rcParams['savefig.dpi'] = 300
         mpl.rcParams['font.family'] = 'serif'

In [14]: import warnings
         warnings.simplefilter('ignore')

In [15]: for _ in range(5):
             fin.reset()
             fin.data[fin.symbol].loc['2022-7-1':].plot(lw=0.75, c='b')
rlff 0401
Figure 4-1. Noisy time series data for half a year

Using the new type of environment, the DQL agent—see the Python class in “DQLAgent Python Class”—can now be trained with a new, noisy data set for each episode. As the following Python code shows, the agent learns to distinguish between information (original movements) and the noisy components quite well:

In [16]: %run dqlagent.py

In [17]: os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

In [18]: agent = DQLAgent(fin.symbol, fin.feature, fin.n_features, fin)

In [19]: %time agent.learn(250)
         episode= 250 | treward=   8.00 | max=1441.00
         CPU times: user 27.3 s, sys: 3.92 s, total: 31.2 s
         Wall time: 26.9 s

In [20]: agent.test(5)
         total reward=2604 | accuracy=0.601
         total reward=2604 | accuracy=0.590
         total reward=2604 | accuracy=0.597
         total reward=2604 | accuracy=0.593
         total reward=2604 | accuracy=0.617

Simulated Time Series Data

In “Noisy Time Series Data”, a historical financial time series is adjusted by adding white noise to it. Here the financial time series itself is simulated under suitable assumptions. Both approaches have in common that they allow the generation of an infinite number of different paths. However, using the Monte Carlo simulation (MCS) approach in this section leads to quite different paths in general that only, on average, show desired properties—such as a certain drift or a certain volatility.

In the following, a stochastic process according to Vasicek (1977) is simulated. Originally used to model the stochastic evolution of interest rates, it allows the simulation of trending or mean-reverting financial time series. The Vasicek model with proportional volatility is described through the following stochastic differential equation:1

d x t = κ ( θ - x t ) d t + σ x t d Z t

The variables and parameters have the following meanings: x t is the process level at date t, κ is the mean-reversion factor, θ is the long-term mean of the process, and σ is the constant volatility parameter for Z t , which is a standard Brownian motion.

For the simulations, an Euler-Maruyama discretization scheme is used (with s = t - Δ t and z t being standard normal):

x t = x s + κ ( θ - x s ) Δ t + σ x s Δ t z t

The Simulation class implements a financial environment that relies on the simulation of the stochastic process previously mentioned. The following Python code shows the initialization part of the class:

In [21]: class Simulation:
             def __init__(self, symbol, feature, n_features,
                          start, end, periods,
                          min_accuracy=0.525, x0=100,
                          kappa=1, theta=100, sigma=0.2,
                          normalize=True, new=False):
                 self.symbol = symbol
                 self.feature = feature
                 self.n_features = n_features
                 self.start = start  1
                 self.end = end  2
                 self.periods = periods  3
                 self.x0 = x0  4
                 self.kappa = kappa  4
                 self.theta = theta  4
                 self.sigma = sigma  4
                 self.min_accuracy = min_accuracy  5
                 self.normalize = normalize  6
                 self.new = new  7
                 self.action_space = ActionSpace()
                 self._simulate_data()
                 self._prepare_data()
1

The start date for the simulation

2

The end date for the simulation

3

The number of periods to be simulated

4

The model parameters for the simulation

5

The minimum accuracy required to continue

6

The parameter indicating whether normalization is applied to the data or not

7

The parameter indicating whether a new simulation is initiated for every episode or not

The following Python code shows the core method of the class. It implements the MCS for the stochastic process:

In [22]: import math
         class Simulation(Simulation):
             def _simulate_data(self):
                 index = pd.date_range(start=self.start,
                             end=self.end, periods=self.periods)
                 x = [self.x0]  1
                 dt = (index[-1] - index[0]).days / 365 / self.periods  2
                 for t in range(1, len(index)):
                     x_ = (x[t - 1] + self.kappa * (self.theta - x[t - 1]) * dt
                           + x[t - 1] * self.sigma * math.sqrt(dt) *
                           random.gauss(0, 1))  3
                     x.append(x_)  4

                 self.data = pd.DataFrame(x, columns=[self.symbol],
                                          index=index)  5
1

The initial value of the process (the boundary condition).

2

The length of the time interval, given the one-year horizon and the number of steps.

3

The Euler-Maruyama discretization scheme for the simulation itself.

4

The simulated value is appended to the list object.

5

The simulated process is transformed into a DataFrame object.

Data preparation is taken care of by the following code:

In [23]: class Simulation(Simulation):
             def _prepare_data(self):
                 self.data['r'] = np.log(self.data / self.data.shift(1))  1
                 self.data.dropna(inplace=True)
                 if self.normalize:
                     self.mu = self.data.mean()  2
                     self.std = self.data.std()  2
                     self.data_ = (self.data - self.mu) / self.std  2
                 else:
                     self.data_ = self.data.copy()
                 self.data['d'] = np.where(self.data['r'] > 0, 1, 0)  3
                 self.data['d'] = self.data['d'].astype(int)  3
1

Derives the log returns for the simulated process

2

Applies Gaussian normalization to the data

3

Derives the directional values from the log returns

The following methods are helper methods and allow you, for example, to reset the environment:

In [24]: class Simulation(Simulation):
             def _get_state(self):
                 return self.data_[self.feature].iloc[self.bar -
                                         self.n_features:self.bar]  1
             def seed(self, seed):
                 random.seed(seed)  2
                 tf.random.set_seed(seed)  2
             def reset(self):
                 self.treward = 0
                 self.accuracy = 0
                 self.bar = self.n_features
                 if self.new:
                     self._simulate_data()
                     self._prepare_data()
                 state = self._get_state()
                 return state.values, {}
1

Returns the current set of feature values

2

Fixes the seed for different random number generators

The final method .step() is the same as for the NoisyData class:

In [25]: class Simulation(Simulation):
             def step(self, action):
                 if action == self.data['d'].iloc[self.bar]:
                     correct = True
                 else:
                     correct = False
                 reward = 1 if correct else 0
                 self.treward += reward
                 self.bar += 1
                 self.accuracy = self.treward / (self.bar - self.n_features)
                 if self.bar >= len(self.data):
                     done = True
                 elif reward == 1:
                     done = False
                 elif (self.accuracy < self.min_accuracy and self.bar > 25):
                     done = True
                 else:
                     done = False
                 next_state = self.data_[self.feature].iloc[
                     self.bar - self.n_features:self.bar].values
                 return next_state, reward, done, False, {}

With the complete Simulation class, different processes can be simulated. The next code snippet uses three different sets of parameters:

Baseline

No volatility and trending (long-term mean > initial value)

Trend

Volatility and trending (long-term mean > initial value)

Mean-reversion

Volatility and mean-reverting (long-term mean = initial value)

Figure 4-2 shows the simulated processes graphically:

In [26]: sym = 'EUR='

In [27]: env_base = Simulation(sym, sym, 5, start='2024-1-1', end='2025-1-1',
                          periods=252, x0=1, kappa=1, theta=1.1, sigma=0.0,
                          normalize=True)  1
         env_base.seed(100)

In [28]: env_trend = Simulation(sym, sym, 5, start='2024-1-1', end='2025-1-1',
                          periods=252, x0=1, kappa=1, theta=2, sigma=0.1,
                          normalize=True)  2
         env_trend.seed(100)

In [29]: env_mrev = Simulation(sym, sym, 5, start='2024-1-1', end='2025-1-1',
                          periods=252, x0=1, kappa=1, theta=1, sigma=0.1,
                          normalize=True)  3
         env_mrev.seed(100)

In [30]: env_mrev.data[sym].iloc[:3]
Out[30]: 2024-01-02 10:59:45.657370517    1.004236
         2024-01-03 21:59:31.314741035    1.009752
         2024-01-05 08:59:16.972111553    1.011010
         Name: EUR=, dtype: float64

In [31]: env_base.data[sym].plot(figsize=(10, 6), label='baseline', style='r')
         env_trend.data[sym].plot(label='trend', style='b:')
         env_mrev.data[sym].plot(label='mean-reversion', style='g--')
         plt.legend();
1

The baseline case

2

The trend case

3

The mean-reversion case

rlff 0402
Figure 4-2. The simulated processes2

Model Parameter Choice

The Vasicek (1977) model provides a certain degree of flexibility to simulate stochastic processes with different characteristics. However, in practical applications, the parameters would not be chosen arbitrarily but rather derived—through optimization methods—from market-observed data. This procedure is generally called model calibration and has a long tradition in computational finance. See, for example, Hilpisch (2015) for more details.

By default, resetting the Simulation environment generates a new simulated process, as Figure 4-3 illustrates:

In [32]: sim = Simulation(sym, 'r', 4, start='2024-1-1', end='2028-1-1',
                          periods=2 * 252, min_accuracy=0.485, x0=1,
                          kappa=2, theta=2, sigma=0.15,
                          normalize=True, new=True)
         sim.seed(100)

In [33]: for _ in range(10):
             sim.reset()
             sim.data[sym].plot(figsize=(10, 6), lw=1.0, c='b');
rlff 0403
Figure 4-3. Multiple simulated, trending processes

The DQLAgent from “DQLAgent Python Class” works with this environment in the same way it worked with the NoisyData environment in the previous section. The following example uses the parametrization from before for the Simulation environment, which is a trending case. The agent learns quite well to predict the future directional movement:

In [34]: agent = DQLAgent(sim.symbol, sim.feature,
                          sim.n_features, sim, lr=0.0001)

In [35]: %time agent.learn(500)
         episode= 500 | treward= 265.00 | max= 286.00
         CPU times: user 42.1 s, sys: 5.87 s, total: 47.9 s
         Wall time: 40.1 s

In [36]: agent.test(5)
         total reward= 499 | accuracy=0.547
         total reward= 499 | accuracy=0.515
         total reward= 499 | accuracy=0.561
         total reward= 499 | accuracy=0.533
         total reward= 499 | accuracy=0.549

The next example assumes a mean-reverting case, in which the DQLAgent is not able to predict the future directional movements as well as before. It seems that learning a trend might be easier than learning from simulated mean-reverting processes:

In [37]: sim = Simulation(sym, 'r', 4, start='2024-1-1', end='2028-1-1',
                          periods=2 * 252, min_accuracy=0.6, x0=1,
                          kappa=1.25, theta=1, sigma=0.15,
                          normalize=True, new=True)
         sim.seed(100)

In [38]: agent = DQLAgent(sim.symbol, sim.feature,
                          sim.n_features, sim, lr=0.0001)

In [39]: %time agent.learn(500)
         episode= 500 | treward=  12.00 | max=  70.00
         CPU times: user 17.8 s, sys: 2.66 s, total: 20.4 s
         Wall time: 16.3 s

In [40]: agent.test(5)
         total reward= 499 | accuracy=0.487
         total reward= 499 | accuracy=0.495
         total reward= 499 | accuracy=0.511
         total reward= 499 | accuracy=0.487
         total reward= 499 | accuracy=0.449

Conclusions

The addition of white noise to a historical financial time series allows, in principle, the generation of an unlimited number of data sets to train a DQL agent. Varying the degree of noise (i.e., the standard deviation) may cause the adjusted time series data to be close to or very different from the original time series. In turn, this can make it easier or more difficult for the DQL agent to learn to distinguish information from the added noise.

Simulation approaches were introduced to finance long before the widespread adoption of computers in the industry. Boyle (1977) is considered the seminal article in this regard. Glasserman (2004) provides a comprehensive overview of MCS techniques for finance.

Using MCS for stochastic processes allows the simulation of trending and mean-reverting processes. Typical trending financial time series are stock index levels or individual stock prices. Typical mean-reverting financial time series are foreign exchange (FX) rates or commodity prices.

In this chapter, the parameters for the simulation are assumed “out-of-the-blue.” In a more realistic setting, appropriate parameter values could be found, for example, through the calibration of the Vasicek (1977) model to the prices of liquidly traded options—an approach with a long tradition in computational finance.3

The examples in this chapter show that the DQLAgent can more easily learn about trending time series than about mean-reverting ones. The next chapter turns our attention to generative approaches to the creation of synthetic time series data based on neural networks.

References

  • Boyle, Phelim P. “Options: A Monte Carlo Approach.” Journal of Financial Economics 4, no. 3 (1977): 323–338.

  • Brennan, M. J., and E. S. Schwartz. “An Equilibrium Model of Bond Pricing and a Test of Market Efficiency.” Journal of Financial and Quantitative Analysis, 15, no. 3 (1980): 361–372.

  • Glasserman, Paul. Monte Carlo Methods in Financial Engineering. New York: Springer, 2004.

  • Halevy, Alon, Peter Norvig, and Fernando Preira. “The Unreasonable Effectiveness of Data.” IEEE Intelligent Systems 24, no. 2 (May 2009): 8–12.

  • Hilpisch, Yves. Derivatives Analytics with Python: Data Analysis, Models, Simulation, Calibration, and Hedging. Chichester, MA: Wiley Finance, 2015.

  • Hilpisch, Yves. Python for Finance: Mastering Data-Driven Finance. 2nd ed. Sebastopol, CA: O’Reilly, 2018.

  • Vasicek, Oldrich. “An Equilibrium Characterization of the Term Structure.” Journal of Financial Economics 5, no. 2 (November 1977): 177–188.

DQLAgent Python Class

The following Python code is from the dqlagent.py module and contains the DQLAgent class used in this chapter:

#
# Deep Q-Learning Agent
#
# (c) Dr. Yves J. Hilpisch
# Reinforcement Learning for Finance
#

import os
import random
import warnings
import numpy as np
import tensorflow as tf
from tensorflow import keras
from collections import deque
from keras.layers import Dense, Flatten
from keras.models import Sequential

warnings.simplefilter('ignore')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'


from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

opt = keras.optimizers.legacy.Adam


class DQLAgent:
    def __init__(self, symbol, feature, n_features, env, hu=24, lr=0.001):
        self.epsilon = 1.0
        self.epsilon_decay = 0.9975
        self.epsilon_min = 0.1
        self.memory = deque(maxlen=2000)
        self.batch_size = 32
        self.gamma = 0.5
        self.trewards = list()
        self.max_treward = -np.inf
        self.n_features = n_features
        self.env = env
        self.episodes = 0
        self._create_model(hu, lr)
        
    def _create_model(self, hu, lr):
        self.model = Sequential()
        self.model.add(Dense(hu, activation='relu',
                             input_dim=self.n_features))
        self.model.add(Dense(hu, activation='relu'))
        self.model.add(Dense(2, activation='linear'))
        self.model.compile(loss='mse', optimizer=opt(learning_rate=lr))
        
    def _reshape(self, state):
        state = state.flatten()
        return np.reshape(state, [1, len(state)])
            
    def act(self, state):
        if random.random() < self.epsilon:
            return self.env.action_space.sample()
        return np.argmax(self.model.predict(state)[0])
        
    def replay(self):
        batch = random.sample(self.memory, self.batch_size)
        for state, action, next_state, reward, done in batch:
            if not done:
                reward += self.gamma * np.amax(
                    self.model.predict(next_state)[0])
                target = self.model.predict(state)
                target[0, action] = reward
                self.model.fit(state, target, epochs=1, verbose=False)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay
            
    def learn(self, episodes):
        for e in range(1, episodes + 1):
            self.episodes += 1
            state, _ = self.env.reset()
            state = self._reshape(state)
            treward = 0
            for f in range(1, 5000):
                self.f = f
                action = self.act(state)
                next_state, reward, done, trunc, _ = self.env.step(action)
                treward += reward
                next_state = self._reshape(next_state)
                self.memory.append(
                    [state, action, next_state, reward, done])
                state = next_state 
                if done:
                    self.trewards.append(treward)
                    self.max_treward = max(self.max_treward, treward)
                    templ = f'episode={self.episodes:4d} | '
                    templ += f'treward={treward:7.3f}'
                    templ += f' | max={self.max_treward:7.3f}'
                    print(templ, end='\r')
                    break
            if len(self.memory) > self.batch_size:
                self.replay()
        print()
        
    def test(self, episodes, min_accuracy=0.0,
             min_performance=0.0, verbose=True,
             full=True):
        ma = self.env.min_accuracy
        self.env.min_accuracy = min_accuracy
        if hasattr(self.env, 'min_performance'):
            mp = self.env.min_performance
            self.env.min_performance = min_performance
            self.performances = list()
        for e in range(1, episodes + 1):
            state, _ = self.env.reset()
            state = self._reshape(state)
            for f in range(1, 5001):
                action = np.argmax(self.model.predict(state)[0])
                state, reward, done, trunc, _ = self.env.step(action)
                state = self._reshape(state)
                if done:
                    templ = f'total reward={f:4d} | '
                    templ += f'accuracy={self.env.accuracy:.3f}'
                    if hasattr(self.env, 'min_performance'):
                        self.performances.append(self.env.performance)
                        templ += f' | performance={self.env.performance:.3f}'
                    if verbose:
                        if full:
                            print(templ)
                        else:
                            print(templ, end='\r')
                    break
        self.env.min_accuracy = ma
        if hasattr(self.env, 'min_performance'):
            self.env.min_performance = mp
        print()

1 For more details on MCS with Python, see Chapter 12 of Hilpisch (2018). The Vasicek model with proportional volatility is also called the Brennan-Schwartz model. It dates back to the Brennan and Schwartz (1980) paper.

2 The careful observer will notice that the three processes do not start at exactly the same point on the graph. This is because the initial value gets “lost” after the calculation of the log returns and the cleanup of the DataFrame object.

3 For details, numerical techniques, and Python code examples in the context of financial model calibration, see Hilpisch (2015).

Get Reinforcement Learning for Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.