参考文献
1章 バンディット問題
[1] Van der Maaten, Laurens, and Geoffrey Hinton. “Visualizing data using t-SNE.”
Journal of machine learning research 9.11 (2008).
[2] Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer. “Finite-time analysis of the multiarmed bandit problem.”
Machine learning 47.2 (2002): 235-256.
[3] Williams, Ronald J. “Simple statistical gradient-following algorithms for connectionist reinforcement learning.”
Machine learning 8.3 (1992): 229-256.
2章 マルコフ決定過程
[4] Csaba Szepesvari『速習 強化学習 ―基礎理論とアルゴリズム』(共立出版)
4章 動的計画法
[5] Sutton, Richard S., and Andrew G. Barto. “Reinforcement learning: An introduction.” MIT press, 2018.
7章 ニューラルネットワークとQ学習
[6] Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning ...