Q-learning solution

Now we have to face the most demanding phase: the training of our system. In the previous section, we said that the gym library is focused on the episodic setting of reinforced learning. The agent's experience is divided into a series of episodes. The initial state of the agent is randomly sampled by a distribution and the interaction proceeds until the environment reaches a terminal state. This procedure is repeated for each episode with the aim of maximizing the total reward expectation per episode and achieving a high level of performance in the fewest possible episodes.

In the learning phase, we must estimate an evaluation function. This function must be able to evaluate, through the sum of the rewards, the convenience ...

Get Keras Reinforcement Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.