Most data scientists and engineers today rely on quality labeled data to train machine learning models. But building a training set manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There's a more practical approach. In this book, Wee Hyong Tok, Amit Bahree, and Senja Filipi show you how to create products using weakly supervised learning models.
You'll learn how to build natural language processing and computer vision projects using weakly labeled datasets from Snorkel, a spin-off from the Stanford AI Lab. Because so many companies have pursued ML projects that never go beyond their labs, this book also provides a guide on how to ship the deep learning models you build.
- Get up to speed on the field of weak supervision, including ways to use it as part of the data science process
- Use Snorkel AI for weak supervision and data programming
- Get code examples for using Snorkel to label text and image datasets
- Use a weakly labeled dataset for text and image classification
- Learn practical considerations for using Snorkel with large datasets and using Spark clusters to scale labeling
- Foreword by Xuedong Huang
- Foreword by Alex Ratner
- Preface
- 1. Introduction to Weak Supervision
2. Diving into Data Programming with Snorkel
- Snorkel, a Data Programming Framework
- Getting Started with Labeling Functions
- Reaching Labeling Consensus with LabelModel
- Strategies to Improve the Labeling Functions
- Data Augmentation with Snorkel Transformers
- Summary
3. Labeling in Action
- Labeling a Text Dataset: Identifying Fake News
- Labeling an Images Dataset: Determining Indoor Versus Outdoor Images
- Summary
4. Using the Snorkel-Labeled Dataset for Text Classification
- Getting Started with Natural Language Processing (NLP)
- Hard Versus Probabilistic Labels
- Using ktrain for Performing Text Classification
- Using Hugging Face and Transformers
- Summary
- 5. Using the Snorkel-Labeled Dataset for Image Classification
6. Scalability and Distributed Training
- The Need for Scalability
- Distributed Training
- Apache Spark: An Introduction
- Using Azure Databricks to Scale
- Fake News Detection Dataset on Databricks
- Summary
- Index
- Title: Practical Weak Supervision
- Author(s):
- Release date: October 2021
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492077060
