Skip to Content
Training Data for Machine Learning
book

Training Data for Machine Learning

by Anthony Sarkis
November 2023
Beginner to intermediate
329 pages
9h 3m
English
O'Reilly Media, Inc.
Content preview from Training Data for Machine Learning

Chapter 5. Workflow

Introduction

Training data is about creating human meaning with data. Humans are, naturally, a vital component of that. In this chapter, I will cover the nuts and bolts of the human workflow of training data.

I will first provide a brief overview of how workflow is the glue between tech and people. I start with motivations for human tasks and move on to the core themes of workflow:

  • Getting started

  • Quality assurance

  • Analytics and data exploration

  • Data flow

  • Direct annotation

In “Getting Started with Human Tasks” I’ll talk about the basics, things like why schemas tend to stick around, user roles, training, and more. The next most crucial thing to understand is quality assurance (QA). I focus on the structural level of things, thinking about important motivations for having trust in your human annotators, the standard review loop, and common causes of errors.

After you have started and done some basic QA, you will want to start learning about how to analyze your tasks, datasets and more. This section leads into using models to debug your data, and more generally, how to work with models.

Data flow, getting data moving and in front of humans, and then to models, is a key part of workflow.

Finally, I will wrap up the chapter by taking a deep dive into direct annotation itself. This will cover high-level concepts like business process integration, supervising existing data, and interactive automations, as well as a detailed example of video annotation.

Glue ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning
Graph-Powered Analytics and Machine Learning with TigerGraph

Graph-Powered Analytics and Machine Learning with TigerGraph

Victor Lee, Phuc Kien Nguyen, Alexander Thomas

Publisher Resources

ISBN: 9781492094517Errata Page