April 2018
Intermediate to advanced
334 pages
10h 18m
English
The following table summarizes the dilemma between exploration and exploitation:
|
Exploration |
Exploitation |
| To choose other actions randomly apart from the current optimal action and hope to obtain a better reward. | To choose the current optimal action without trying other actions. |
Thus, the dilemma is whether the AI should only trust the learned Q-values based on the actions as per the current optimal policy or it should try other actions randomly in a hope for a better reward resulting in improvement in Q-values and thereby, deriving better optimal policy.