
[Ainslie et al. 2023] Ainslie, Joshua, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy,
Federico Lebron, and Sumit Sanghai (Dec. 2023). "GQA: Training Generalized Multi-Query
Transformer Models from Multi-Head Checkpoints." In: Proceedings of the 2023 Conference
on Empirical Methods in Natural Language Processing. Ed. by Houda Bouamor, Juan Pino,
and Kalika Bali. Singapore: Association for Computational Linguistics, pp. 4895-4901.
[Ba et al. 2016] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton (2016). Layer
Normalization.
[Birr et al. 2024] Birr, Timo, Christoph Pohl, Abdelrahman Younes, and Tamim Asfour
(2024). Auto-GPT+P: Affordance ...