T. AmaratungaUnderstanding Large Language Modelshttps://doi.org/10.1007/979-8-8688-0017-7_3

3. Transformers

Thimira Amaratunga¹

(1)

Nugegoda, Sri Lanka

In 2017, Ashish Vaswani et al. from Google Brain and Google Research proposed a revolutionary new architecture of neural networks for natural language processing (NLP) and other sequence-to-sequence tasks in their “Attention Is All You Need” paper. In this paper, Vaswani et al. presented a new approach that relies heavily on attention mechanisms to process sequences, allowing for parallelization, efficient training, and the ability to capture long-range dependencies in data.

This new architecture proved extremely ...

Get Understanding Large Language Models: Learning Their Underlying Concepts and Technologies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Understanding Large Language Models: Learning Their Underlying Concepts and Technologies by Thimira Amaratunga

3. Transformers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly