Appendix A. Synthetic Data Generation Tools
For domain-specific data generation tools, you have a variety of options.
In the spirit of GANs and flow-based models, there are plenty of projects that train generative models on real-world data and then use the generators as the source of synthetic data. Table A-1 lists several GAN-based methods.
Methods and tools | Description | Further reading | Type |
---|---|---|---|
A GAN-based data synthesizer that can generate synthetic tabular data with high fidelity |
Tabular |
||
Outdated and superseded by CTGAN |
Tabular |
||
Creates fake synthetic datasets with enhanced privacy guarantees |
Tabular |
||
WGAN-GP |
Recommended for training the GAN; suffers less from mode-collapse and has a more meaningful loss than other GAN-based data generation tools |
“On the Generation and Evaluation of Synthetic Tabular Data Using GANs” |
Tabular |
“Generates synthetic data that simulates a given dataset and applies DP techniques to achieve a strong privacy guarantee” |
Tabular |
||
“[A] generative adversarial network for generating multilabel discrete patient records [that] can generate both binary and count variables (i.e., medical codes such as diagnosis codes, medication codes, or procedure codes)” |
“Generating Multi-label Discrete Patient Records Using Generative Adversarial Networks” |
Tabular |
|
Produces synthetic data instances ... |
Get Practicing Trustworthy Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.