Chapter 4. Simulated Data
It is often said that data is the new oil, but this analogy is not quite right. Oil is a finite resource that must be extracted and refined, whereas data is an infinite resource that is constantly being generated and refined.
Halevy et al. (2009)
A major drawback of the financial environment as introduced in the previous chapter is that it relies by default on a single, historical financial time series. This is a too-limited data set with which to train a deep Q-learning (DQL) agent. It is like training an AI on a single game of chess and expecting it to perform well overall in chess.
This chapter introduces simulation-based approaches to augmenting the available data for the training of a DQL agent. The first approach, as introduced in “Noisy Time Series Data”, is to add random noise to a static financial time series. Although it is commonly agreed upon that financial time series data generally already contains noise—as compared to price movements or returns that are information induced—the idea is to train the agent on a large number of similar time series in the hope that it learns to distinguish information from noise.
The second approach, discussed in “Simulated Time Series Data”, is to generate financial time series data through simulation under certain constraints and assumptions. In general, a stochastic differential equation is assumed for the dynamics of the time series. The time series is then simulated given a discretization scheme and appropriate ...