Video description
One of the key bottlenecks in building ML systems is creating and managing the massive training datasets that today’s models learn from.
Alex Ratner outlines work on Snorkel, an open source framework for building and managing training datasets, and details three key operators for letting users build and manipulate training datasets: labeling functions for labeling unlabeled data, transformation functions for expressing data augmentation strategies, and slicing functions for partitioning and structuring training datasets. These operators allow domain expert users to specify ML models via noisy operators over training data, leading to applications that can be built in hours or days rather than months or years. Alex explores recent work on modeling the noise and imprecision inherent in these operators and using these approaches to train ML models that solve real-world problems, including a recent state-of-the-art result on the SuperGLUE natural language processing benchmark task.
Prerequisite knowledge
- A basic understanding of machine learning
What you'll learn
- Discover learning techniques for building, managing, and iterating on training datasets and modeling pipelines for ML in general and using the Snorkel framework
This session is from the 2019 O'Reilly Artificial Intelligence Conference in San Jose, CA.
Product information
- Title: Building and managing training datasets for ML with Snorkel
- Author(s):
- Release date: February 2020
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 0636920370796
You might also like
book
Machine Learning Approaches for Convergence of IoT and Blockchain
The digital revolution is characterized by the convergence of technologies, rapidly advancing the 4 th industrial …
book
TensorFlow Machine Learning Projects
Implement TensorFlow's offerings such as TensorBoard, TensorFlow.js, TensorFlow Probability, and TensorFlow Lite to build smart automation …
book
Apache Spark Machine Learning Blueprints
Develop a range of cutting-edge machine learning projects with Apache Spark using this actionable guide About …
book
Ensemble Machine Learning Cookbook
Implement machine learning algorithms to build ensemble models using Keras, H2O, Scikit-Learn, Pandas and more Key …