16SELF-ATTENTION
Where does self-attention get its name, and how is it different from previously developed attention mechanisms?
Self-attention enables a neural network to refer to other portions of the input while focusing on a particular segment, essentially allowing each part the ability to “attend” to the whole input. The original attention mechanism developed for recurrent neural networks (RNNs) is applied between two different sequences: the encoder and the decoder embeddings. Since the attention mechanisms used in transformer-based large language models is designed to work on all elements of the same set, it is known as self-attention. ...
Get Machine Learning Q and AI now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.