There were five changes from the previous version of AlphaGo. They were as follows:
- Trains entirely from self play that is no human experts game play data and learning everything from scratch. Earlier versions had supervised learning policy networks, which was trained on expert game plays.
- No hand-crafted features.
- Replaced the normal convolution architecture with residual convolution architecture.
- Instead of a separate policy and value network, AlphaGo Zero has combined both of them into a single large network.
- Simplified the Monte Carlo Tree Search, which uses this large neural network for simulations.
The network input consists of:
- 19 x 19 matrix plane representing the board of Go
- One feature ...