April 2018
Intermediate to advanced
334 pages
10h 18m
English
A deep Q-network is learnt with
-greedy approach, starting with
=1 (full 100% exploration) and decreases until =0.1 (only 10% exploration, 90% exploitation) in steps of 0.1. During exploration random actions are chosen, this is because with better exploration local minima can be avoided and the unknown optimized path to goal state can also be unveiled. Moreover, in order to help the agent learn the terminal action, the agent is forced to take that action each time the current region has a IoU > which in turn accelerates the ...