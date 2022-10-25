Book description
Your training data has as much to do with the success of your data project as the algorithms themselves--most failures in deep learning systems relate to training data. But while training data is the foundation for successful machine learning, there are few comprehensive resources to help you ace the process. This hands-on guide explains how to work with and scale training data. You'll gain a solid understanding of the concepts, tools, and processes needed to:
- Design, deploy, and ship training data for production-grade deep learning applications
- Integrate with a growing ecosystem of tools
- Recognize and correct new training data-based failure modes
- Improve existing system performance and avoid development risks
- Confidently use automation and acceleration approaches to more effectively create training data
- Avoid data loss by structuring metadata around created datasets
- Clearly explain training data concepts to subject matter experts and other shareholders
- Successfully maintain, operate, and improve your system
Table of contents
-
1. Training Data Introduction
- What is Training Data?
-
Concepts Introduction
- Representations
- Choices
- Who Supervises the Data
- Sets of Assumptions
- Randomness
- Processes and Process Automation
- Supervision Automation and Tooling
- Dataset Construction & Maintenance
- Relevancy
- Integrated System Design
- What-To-Label
- Transfer Learning
- Per Sample Judgement Calls
- Ethical & Privacy Considerations
- Why Training Data Matters for Supervised Learning
- Contexts in Training Data: Classic and Supervised
- Training Data Sample Creation
- Training Data Process Introduction
- Training Data Management
- Challenges Introduction
- Summary
-
2. Training Data Concepts
- Schema Deep Dive Introduction
- What is it? Labels & Attributes
- Where is it? - Spatial Representation
- When is it? - Relationships, Sequences, Time Series
- Guides, Instructions
- Relation of Machine Learning Tasks to Training Data
- General Concepts
- Advanced concepts
- Raw Data Concepts
- Summary
-
3.
Annotation Literal Concepts
- Chapter Organization: Administrators and Annotators
- Partnering with non-software users in new ways
- Administrators Process Overview
- Introduction to Annotation Tools
- Importing Data & Data Prep
- Define your Schema - what you want to label.
- Create Tasks for your Annotators.
- TK: Your annotators view the images and do the annotation
- TK: Export
- Quality Assurance
- Automations
- Semantic Segmentation
- Video
- Common Issues in annotation
- Summary
-
4.
The Day-to-Day Practices of Training Data
- Introduction
- Ingest
- Store
- Workflow
- Annotation
- Annotation Automation
- Stream to Training
- Explore & Debug Data
- TK: Secure & Private Data
- Summary
-
5. Annotation Automation
- Introduction
- Getting Started
- Pre-Labeling
- Interactive Annotation Automation
- Quality Assurance (QA) Automation
- Data Discovery - What to Label Exploration
- Simulation & Synthetic Data
- Media Specific
- Augmentation
- Domain Specific
-
6. Tools
- Introduction
-
Why Training Data Tools
- What do Training Data Tools Do?
- Best practices and levels of competency
- Human Computer Supervision
- Tools Bring Clarity
- Understanding the Importance of Tooling
- Realizing the Need for Dedicated Tooling
- More Usage, More Demands
- Advent of New Standards
- Journey to the Suite
- Open Source Standards
- A paradigm to deliver machine learning software
- Scale
- Scope
- Tooling quickstart
- Training Data Tooling Hidden Assumptions
- Security
- Open Source and Closed Source
- Deployment
- Costs
- Annotation Interfaces
- Integrations
- Ease of Use
- Installation and organization
- Configuration Choices
- Bias in training data
- Metadata
-
7. AI Transformation
- AI Transformation Introduction
- Getting Started
- The Creative Revolution of Data Centric AI
-
Appoint a Leader: a Director of Training Data
- Go From a Work Pool to Standard Expectation for All
- Sometimes Proposals and Corrections, Sometimes Replacement
- Upstream Producers and Downstream Consumers
- Reading this Chart
- Spectrum of Training Data Team Engagement
- Dedicated Producers and Other Teams
- Organizing Producers from Other Teams
- Securing your AI Future
- Use Case Discovery
- Rethink AI Annotation Talent - quality over quantity
- Adopt Modern Training Data Tools
- About the Author
Product information
- Title: Training Data for Machine Learning
- Author(s):
- Release date: October 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492094524
