参考文献

1章 バンディット問題

[1] Van der Maaten, Laurens, and Geoffrey Hinton. “Visualizing data using t-SNE.” Journal of machine learning research 9.11 (2008).
[2] Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer. “Finite-time analysis of the multiarmed bandit problem.” Machine learning 47.2 (2002): 235-256.
[3] Williams, Ronald J. “Simple statistical gradient-following algorithms for connectionist reinforcement learning.” Machine learning 8.3 (1992): 229-256.

2章 マルコフ決定過程

[4] Csaba Szepesvari『速習 強化学習 ―基礎理論とアルゴリズム』(共立出版)

4章 動的計画法

[5] Sutton, Richard S., and Andrew G. Barto. “Reinforcement learning: An introduction.” MIT press, 2018.

7章 ニューラルネットワークとQ学習

[6] Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning ...

Get ゼロから作るDeep Learning ❹ ―強化学習編 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.