Skip to Content
Reliable Machine Learning
book

Reliable Machine Learning

by Cathy Chen, Niall Richard Murphy, Kranti Parisa, D. Sculley, Todd Underwood
September 2022
Intermediate to advanced content levelIntermediate to advanced
408 pages
12h 49m
English
O'Reilly Media, Inc.
Book available
Content preview from Reliable Machine Learning

Chapter 4. Feature and Training Data

It should be clear by this point that models come from data. This chapter is about the data: how it is created, processed, annotated, stored, and ultimately used to create the model. You will see that managing and handling the data creates specific challenges for repeatability, manageability, and reliability, and we will make some concrete recommendations about how to approach those challenges. For background, make sure to see (if you haven’t already) Chapters 2 and 3.

This chapter covers the infrastructure that accepts data from a source and readies it for use by the training system. We will discuss three fundamental functional subsystems involved in this task: a feature system, a system for human annotations, and a metadata system. We discussed features a little in the previous chapter; another way of thinking about them is that they are characteristics of the input data, especially characteristics that we have determined are predictive of something we care about. Labels are specific cases of the output that we want from the model that we ultimately train. They are used as examples to train that model. Another way to think about labels is that they are the target or “correct” values for a specific data instance that the model will learn. Labels can be extracted from logs by correlating the data with another independent event, or they can be generated by humans. We’ll discuss the systems needed for generation ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Grokking Machine Learning

Grokking Machine Learning

Luis Serrano
Architecting Data and Machine Learning Platforms

Architecting Data and Machine Learning Platforms

Marco Tranquillin, Valliappa Lakshmanan, Firat Tekiner

Publisher Resources

ISBN: 9781098106218Errata Page