Summary
This chapter covers the basic principles of Reinforcement Learning and the fundamental Q-learning algorithm.
The distinctive feature of Q-learning is its capacity to choose between immediate rewards and delayed rewards. Q-learning at its simplest uses tables to store data. This very quickly loses viability as the state/action space of the system it is monitoring/controlling increases.
We can overcome this problem by using a neural network as a function approximator, which takes the state and action as input, and outputs the corresponding Q-value.
Following this idea, we implemented a Q-learning neural network using the TensorFlow framework and the OpenAI Gym toolkit for developing and comparing Reinforcement Learning algorithms.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access