Skip to Content
Training Data for Machine Learning
book

Training Data for Machine Learning

by Anthony Sarkis
November 2023
Beginner to intermediate
329 pages
9h 3m
English
O'Reilly Media, Inc.
Content preview from Training Data for Machine Learning

Chapter 1. Training Data Introduction

Data is all around us—videos, images, text, documents, as well as geospatial, multi-dimensional data, and more. Yet, in its raw form, this data is of little use to supervised machine learning (ML) and artificial intelligence (AI). How do we make use of this data? How do we record our intelligence so it can be reproduced through ML and AI? The answer is the art of training data—the discipline of making raw data useful.

In this book you will learn:

  • All-new training data (AI data) concepts

  • The day-to-day practice of training data

  • How to improve training data efficiency

  • How to transform your team to be more AI/ML-centric

  • Real-world case studies

Before we can cover some of these concepts, we first have to understand the foundations, which this chapter will unpack.

Training data is about molding, reforming, shaping, and digesting raw data into new forms: creating new meaning out of raw data to solve problems. These acts of creation and destruction sit at the intersection of subject matter expertise, business needs, and technical requirements. It’s a diverse set of activities that crosscut multiple domains.

At the heart of these activities is annotation. Annotation produces structured data that is ready to be consumed by a machine learning model. Without annotation, raw data is considered to be unstructured, usually less valuable, and often not usable for supervised learning. That’s why training data is required for modern machine learning ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning
Graph-Powered Analytics and Machine Learning with TigerGraph

Graph-Powered Analytics and Machine Learning with TigerGraph

Victor Lee, Phuc Kien Nguyen, Alexander Thomas

Publisher Resources

ISBN: 9781492094517Errata Page