10 ALBERT, adapters, and multitask adaptation strategies

This chapter covers

Applying embedding factorization and parameter sharing across layers
Fine-tuning a model from the BERT family on multiple tasks
Splitting a transfer learning experiment into multiple steps
Applying adapters to a model from the BERT family

In the previous chapter, we began our coverage of some adaptation strategies for the deep NLP transfer learning modeling architectures that we have covered so far. In other words, given a pretrained architecture such as ELMo, BERT, or GPT, how can transfer learning be carried out more efficiently? We covered two critical ideas behind the method ULMFiT, namely the concepts of discriminative fine-tuning and gradual unfreezing.

The ...

Get Transfer Learning for Natural Language Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Transfer Learning for Natural Language Processing by Paul Azunre

10 ALBERT, adapters, and multitask adaptation strategies

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly