Chapter 8. Automation
Introduction
Automation can help create robust processes, reduce tedious workloads, and improve quality. The first topic I’ll cover in this chapter is pre-labeling—the idea of running a model before annotation. I’ll cover the basics and then step through more advanced concepts like pre-labeling just a portion of the data.
Next, interactive automations are when a user adds information in order to help the algorithm. The end goal of interactive automations is to make annotation work a more natural extension of human thought. For example, drawing a box to automatically get a tighter location marked by a polygon feels intuitive to us.
Quality assurance (QA) is one of the common uses of training data tools. I cover exciting new methods like using the model to debug the ground truth. Other tools automatically check base cases and look at the data for general reasonableness.
Pre-labeling, interactive automations, and QA tools will get you far. After covering the foundations, I’ll walk through key aspects of data exploration and discovery. What if you could query the data and only label the most relevant parts? This area includes concepts like filtering an unknown dataset down to manageable size and more.
I will touch on data augmentation, common ways it’s used, and cautions to be aware of. When we augment data, we derive new data based on the existing base information. From that viewpoint, it’s easier to think of the base information as the core training data and ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access