Keras Reinforcement Learning Projects
by Giuseppe Ciaburro, Sudharsan Ravichandiran, Suriyadeepan Ramamoorthy
Q-learning solution
Now we have to face the most demanding phase: the training of our system. In the previous section, we said that the gym library is focused on the episodic setting of reinforced learning. The agent's experience is divided into a series of episodes. The initial state of the agent is randomly sampled by a distribution and the interaction proceeds until the environment reaches a terminal state. This procedure is repeated for each episode with the aim of maximizing the total reward expectation per episode and achieving a high level of performance in the fewest possible episodes.
In the learning phase, we must estimate an evaluation function. This function must be able to evaluate, through the sum of the rewards, the convenience ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access