Chapter 10: Synthetic tabular data: copulas vs enhanced GANs
Abstract
In this chapter, you will learn how to create your own tabular synthetic data in Python, using two popular techniques: GANs and copulas. One example includes a real-life insurance data set: using copulas, you will be able to create an alternate (synthetic) data set that matches very well the distribution of the observations in your training set, including all the correlations. Another example is the diabetes data set; the goal is to predict cancer, and the context is supervised classification. You will learn how to synthesize this data set using GANs (generative adversarial networks). I also discuss data transformations, how to deal with missing data, and modern tools to assess ...
Get Synthetic Data and Generative AI now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.