
[Pascanu et al. 2013] Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio (2013). On the
difficulty of training Recurrent Neural Networks.
[Polyak 1964] Polyak, B.T. (1964). "Some methods of speeding up the convergence of
iteration methods." In: USSR Computational Mathematics and Mathematical Physics 4.5, pp.
1-17. ISSN: 0041-5553.
[Prechelt 2002] Prechelt, Lutz (2002). "Early stopping-but when?" In: Neural Networks:
Tricks of the trade. Springer, pp. 55-69.
[Radford et al. 2019] Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei,
Ilya Sutskever, et al. (2019). "Language models are unsupervised multitask learners." In: ...