Chapter 2. The Synthesis Process

In the previous chapter, we defined the types of synthetic data, its benefits, and how to generate them. This chapter examines the practical implementation of data synthesis in the enterprise.

The implementation of data synthesis at the enterprise level has two key components: the process and the structure. The process consists of the key steps that indicate how to integrate synthesis into a data pipeline. The structure is typically operationalized through a Synthesis Center of Excellence. This would be a new entity within the organization that provides support throughout the enterprise in terms of process, technology, and governance for data synthesis implementations. This chapter describes the process and structure in some detail to provide guidance and present critical success factors.

In practice, the data synthesis capabilities described here may be deployed by large organizations as well as solo practitioners in many possible scenarios. Therefore, the following descriptions will need to be tailored to accommodate specific circumstances.

Data Synthesis Projects

Data synthesis projects have some processes that are focused on the generation of data and the validation of outputs, and other processes that prepare real data so that it can be synthesized. Validation includes both the evaluation of data utility and privacy assurance. In this section, we describe these processes and provide guidance on their application.

Data Synthesis Steps

A general ...

Get Accelerating AI with Synthetic Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.