Chapter 5. Workflow

Introduction

Training data is about creating human meaning with data. Humans are, naturally, a vital component of that. In this chapter, I will cover the nuts and bolts of the human workflow of training data.

I will first provide a brief overview of how workflow is the glue between tech and people. I start with motivations for human tasks and move on to the core themes of workflow:

  • Getting started

  • Quality assurance

  • Analytics and data exploration

  • Data flow

  • Direct annotation

In “Getting Started with Human Tasks” I’ll talk about the basics, things like why schemas tend to stick around, user roles, training, and more. The next most crucial thing to understand is quality assurance (QA). I focus on the structural level of things, thinking about important motivations for having trust in your human annotators, the standard review loop, and common causes of errors.

After you have started and done some basic QA, you will want to start learning about how to analyze your tasks, datasets and more. This section leads into using models to debug your data, and more generally, how to work with models.

Data flow, getting data moving and in front of humans, and then to models, is a key part of workflow.

Finally, I will wrap up the chapter by taking a deep dive into direct annotation itself. This will cover high-level concepts like business process integration, supervising existing data, and interactive automations, as well as a detailed example of video annotation.

Glue ...

Get Training Data for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.