Deep Reinforcement Learning Hands-On
by Oleg Vasilev, Maxim Lapan, Martijn van Otterlo, Mikhail Yurushkin, Basem O. F. Alijla
Connect4 results
To make the training fast, the hyperparameters of the training process were intentionally chosen to be small. For example, at every step of the self-play process, only 10 MCTS were performed, each with a minibatch size of eight. This, in combination with efficient minibatch MCTS and the fast game engine, made training very fast. Basically, after just one hour of training and 2,500 games played in the self-play mode, the produced model was sophisticated enough to be enjoyable to play against. Of course, the level of its play was well below even a kid's level, but it showed some rudimentary strategies and made mistakes in only every other move, which was good progress.
The training was left running for a day, which resulted in 55k ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access