August 2019
Intermediate to advanced
342 pages
9h 35m
English
We have said that, in RL, the learning process is guided by feedback, emulating a decision-making approach led by trial and error. In achieving a goal such as finding the exit of the maze, the agent will perform actions (moves) that are associated with feedback (rewards or punishments) from different environments.
The feedback is emitted based on the state assumed by the agent after each action, that is, the place occupied by the agent after each move. Feedback is then sent from the environment to the agent. As a consequence, the agent iteratively updates its predictions for the next states, based on the rewards received, weighing the subsequent action's success with probabilistic estimates. By leveraging ...
Read now
Unlock full access