1章 バンディット問題
[1] Van der Maaten, Laurens, and Geoffrey Hinton. “Visualizing data using t-SNE.” Journal of machine learning research 9.11 (2008).
[2] Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer. “Finite-time analysis of the multiarmed bandit problem.” Machine learning 47.2 (2002): 235-256.
[3] Williams, Ronald J. “Simple statistical gradient-following algorithms for connectionist reinforcement learning.” Machine learning 8.3 (1992): 229-256.
2章 マルコフ決定過程
[4] Csaba Szepesvari『速習 強化学習 ―基礎理論とアルゴリズム』(共立出版)
4章 動的計画法
[5] Sutton, Richard S., and Andrew G. Barto. “Reinforcement learning: An introduction.” MIT press, 2018.
7章 ニューラルネットワークとQ学習
[6] Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning ...
Get ゼロから作るDeep Learning ❹ ―強化学習編 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.