October 2019
Intermediate to advanced
366 pages
12h 4m
English
In the case that the actions are discrete and limited in number, the most common approach is to create a parameterized policy that produces a numerical value for each action.
Then, each output value is converted to a probability. This operation is performed with the softmax function, which is given as follows:

The softmax values are normalized to have a sum of one, so as to produce a probability distribution where each value corresponds to the probability of selecting a given action.
The next two plots ...
Read now
Unlock full access