Chapter 8. Automation

Introduction

Automation can help create robust processes, reduce tedious workloads, and improve quality. The first topic I’ll cover in this chapter is pre-labeling—the idea of running a model before annotation. I’ll cover the basics and then step through more advanced concepts like pre-labeling just a portion of the data.

Next, interactive automations are when a user adds information in order to help the algorithm. The end goal of interactive automations is to make annotation work a more natural extension of human thought. For example, drawing a box to automatically get a tighter location marked by a polygon feels intuitive to us.

Quality assurance (QA) is one of the common uses of training data tools. I cover exciting new methods like using the model to debug the ground truth. Other tools automatically check base cases and look at the data for general reasonableness.

Pre-labeling, interactive automations, and QA tools will get you far. After covering the foundations, I’ll walk through key aspects of data exploration and discovery. What if you could query the data and only label the most relevant parts? This area includes concepts like filtering an unknown dataset down to manageable size and more.

I will touch on data augmentation, common ways it’s used, and cautions to be aware of. When we augment data, we derive new data based on the existing base information. From that viewpoint, it’s easier to think of the base information as the core training data and ...

Get Training Data for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.