7

Using Synthetic Data in Data-Centric Machine Learning

In previous chapters, we discussed various approaches to improving data quality for machine learning purposes through better collection and labeling.

Although human labelers, data ownership, and technical data quality improvement practices are critical to data centricity, there are limits to the kind of labeling and data creation that can be performed by individuals or through empirical observation.

Synthetic data has the potential to fill in these gaps and produce comprehensive training data at a fraction of the cost and time of other approaches.

This chapter provides an introduction to synthetic data generation. We will cover the following main topics:

  • What synthetic data is and why ...

Get Data-Centric Machine Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.