8
Weakly Supervised Learning for Classification with Snorkel
Models such as BERT and GPT use massive amounts of unlabeled data along with an unsupervised training objective, such as a masked language model (MLM) for BERT or a next word prediction model for GPT, to learn the underlying structure of text. A small amount of task-specific data is used for fine-tuning the pre-trained model using transfer learning. Such models are quite large, with hundreds of millions of parameters, and require massive datasets for pre-training and lots of computation capacity for training and pre-training. Note that the critical problem being solved is the lack of adequate training data. If there were enough domain-specific training data, the gains from BERT-like ...
Get Advanced Natural Language Processing with TensorFlow 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.