Chapter 1. Training Data Introduction
Data is all around us—videos, images, text, documents, as well as geospatial, multi-dimensional data, and more. Yet, in its raw form, this data is of little use to supervised machine learning (ML) and artificial intelligence (AI). How do we make use of this data? How do we record our intelligence so it can be reproduced through ML and AI? The answer is the art of training data—the discipline of making raw data useful.
In this book you will learn:
-
All-new training data (AI data) concepts
-
The day-to-day practice of training data
-
How to improve training data efficiency
-
How to transform your team to be more AI/ML-centric
-
Real-world case studies
Before we can cover some of these concepts, we first have to understand the foundations, which this chapter will unpack.
Training data is about molding, reforming, shaping, and digesting raw data into new forms: creating new meaning out of raw data to solve problems. These acts of creation and destruction sit at the intersection of subject matter expertise, business needs, and technical requirements. It’s a diverse set of activities that crosscut multiple domains.
At the heart of these activities is annotation. Annotation produces structured data that is ready to be consumed by a machine learning model. Without annotation, raw data is considered to be unstructured, usually less valuable, and often not usable for supervised learning. That’s why training data is required for modern machine learning ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access