January 2018
Beginner to intermediate
284 pages
8h 35m
English
As explained in previous sections, there are few core components of any RL-based system. A value function v (s,θ) or a Q-function Q (s, a, θ), and a policy function π (a|s, θ), which could be model-free or model based. The wide scale applicability of any RL-based system depends on how good these estimations are. In practice, existing Q-learning systems suffer from multiple drawbacks:
Read now
Unlock full access