参考文献

1章　バンディット問題

[1] Van der Maaten, Laurens, and Geoffrey Hinton. “Visualizing data using t-SNE.” Journal of machine learning research 9.11 (2008).

[2] Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer. “Finite-time analysis of the multiarmed bandit problem.” Machine learning 47.2 (2002): 235-256.

[3] Williams, Ronald J. “Simple statistical gradient-following algorithms for connectionist reinforcement learning.” Machine learning 8.3 (1992): 229-256.

2章　マルコフ決定過程

[4] Csaba Szepesvari『速習強化学習 ―基礎理論とアルゴリズム』（共立出版）

4章　動的計画法

[5] Sutton, Richard S., and Andrew G. Barto. “Reinforcement learning: An introduction.” MIT press, 2018.

7章　ニューラルネットワークとQ学習

[6] Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning ...

Get ゼロから作るDeep Learning ❹ ―強化学習編 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

ゼロから作るDeep Learning ❹ ―強化学習編 by 斎藤康毅

参考文献

1章　バンディット問題

2章　マルコフ決定過程

4章　動的計画法

7章　ニューラルネットワークとQ学習

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

参考文献

1章 バンディット問題

2章 マルコフ決定過程

4章 動的計画法

7章 ニューラルネットワークとQ学習

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

1章　バンディット問題

2章　マルコフ決定過程

4章　動的計画法

7章　ニューラルネットワークとQ学習