2

Annotating Real Data

The fuel of the machine learning (ML) engine is data. Data is available in almost every part of our technology-driven world. ML models usually need to be trained or evaluated on annotated data, not just data! Thus, data by itself is not very useful for ML but annotated data is what ML models need.

In this chapter, we will learn why ML models need annotated data. We will see why the annotation process is expensive, error-prone, and biased. At the same time, you will be introduced to the annotation process for a number of ML tasks, such as image classification, semantic segmentation, and instance segmentation. We will highlight the main annotation problems. At the same time, we will understand why ideal ground truth generation ...

Get Synthetic Data for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.