February 2020
Intermediate to advanced
432 pages
10h 50m
English
A) Via the experience that it gets from the reward it receives each time it executes an action. B) By randomly exploring the environment and discovering the best strategy by trial and error.C) Via a neural network that gives as output a q-value as a function of the state of the system.
A) Yes; this is a characteristic called model-free RL. B) Only if it does not take the model-free RL approach. C) No; by definition, RL methods only need to be aware of rewards and penalties to ensure the learning process.