Algorithm diversity
Why are there so many types of RL algorithms? This is because there isn't one that is better than all the others in every context. Each one is designed for different needs and to take care of different aspects. The most notable differences are stability, sample efficiency, and wall clock time (training time). These will be more clear as we progress through the book but as a rule of thumb, policy gradient algorithms are more stable and reliable than value function algorithms. On the other hand, value function methods are more sample efficient as they are off-policy and can use prior experience. In turn, model-based algorithms are more sample efficient than Q-learning algorithms but their computational cost is much higher ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access