Now, we want to test the Q-learning algorithm using a smaller checkerboard environment and a neural network (with Keras). The main difference from the previous examples is that now, the state is represented by a screenshot of the current configuration; hence, the model has to learn how to associate a value with each input image and action. This isn't actual deep Q-learning (which is based on Deep Convolutional Networks, and requires more complex environments that we cannot discuss in this book), but it shows how such a model can learn an optimal policy with the same input provided to a human being. In order to reduce the training time, we are considering a square checkerboard environment, with four negative ...
Q-learning using a neural network
Get Mastering Machine Learning Algorithms now with the O’Reilly learning platform.
O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.