The word attention is derived from the Latin attentionem, meaning to give heed to or require one’s focus. It’s a word used to demand people’s focus, from military instructors to teachers and parents. Yet, it is a word we also use in mathematics and computer science to describe how well something attends or links to something else.
In 2017, the pivotal paper “Attention Is All You Need” took this concept a step further and introduced the application of attention to deep learning for the purpose of understanding how well features attend to other features. The authors of the paper were ...