Chapter 10. Synthetic Data Generation and The Hidden Markov Model in Finance

The data does not have to be rooted in the real world to have value: it can be fabricated and slotted in where some is missing or hard to get hold of.

Ahuja (2020)

Synthetic data generation has been gaining attention in finance due to rising concerns about confidentiality and increasing data requirements. So, instead of working with real data, why not feed your model with synthetic data as long as it mimics the requisite statistical properties? It sounds appealing, doesn’t it? Synthetic data generation is one part of this chapter; the other part is devoted to another underappreciated but quite important and interesting topic: the hidden Markov model (HMM). You may be tempted to ask: what is the common ground between synthetic data and HMM? Well, we can generate synthetic data from HMM—and this is one of the aims of this chapter. The other aim is to introduce these two important topics, as they are often used in machine learning.

Synthetic Data Generation

The confidentiality, sensitivity, and cost of financial data greatly restricts its usage. This, in turn, hinders the progress and dissemination of useful knowledge in finance. Synthetic data addresses these drawbacks and helps researchers and practitioners conduct their analyses and disseminate the results.

Synthetic data is data generated from a process by which it mimics the statistical properties of the real data. Even though there is a belief ...

Get Machine Learning for Financial Risk Management with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.