Summary
In this chapter, we focused on seq2seq models and the attention mechanism. First, we discussed and implemented a regular recurrent encoder-decoder seq2seq model and learned how to complement it with the attention mechanism. Then, we talked about and implemented a purely attention-based type of model called a transformer. We also defined multihead attention in their context. Next, we discussed transformer language models (such as BERT, transformerXL, and XLNet). Finally, we implemented a simple text-generation example using the transformers library.
This chapter concludes our series of chapters with a focus on natural language processing. In the next chapter, we'll talk about some new trends in deep learning that aren't fully matured ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access