10 Sample-efficient value-based methods

In this chapter

You will implement a deep neural network architecture that exploits some of the nuances that exist in value-based deep reinforcement learning methods.
You will create a replay buffer that prioritizes experiences by how surprising they are.
You will build an agent that trains to a near-optimal policy in fewer episodes than all the value-based deep reinforcement learning agents we’ve discussed.

Intelligence is based on how efficient a species became at doing the things they need to survive.

— Charles Darwin English naturalist, geologist, and biologist Best known for his contributions to the science of evolution

In the previous chapter, we improved on NFQ with the implementation of DQN ...

Get Grokking Deep Reinforcement Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Grokking Deep Reinforcement Learning by Miguel Morales