3

Labeling Data

Artificial Intelligence (AI) models are only as good as the data they are trained with. Hence good, high-quality data is vitally important.

AI algorithms generally start in a basic, simplified, form. In supervised learning, accurately labeling (also known as annotating) data is a vitally important step to train an algorithm, improve its predictions, and ensure that what it learns is right. Numerous studies, reports, and surveys show that data scientists spend anywhere between 50-80% of their time doing data preparation and preprocessing (see Figure 3.1) – and data labeling is usually a huge part of this.

Figure 3.1 – Distribution of time allocated to machine learning tasks

Figure 3.1 – Distribution ...

Get Machine Learning for Emotion Analysis in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.