The approaches we've so far shown can do a good job of learning all kinds of tasks, but an agent trained in these ways can still suffer from significant limitations:
It trains very slowly; a human can learn a game like Pong from a couple of plays, while for Q-learning, it may take millions of playthroughs to get to a similar level.
For games that require long-term planning, all the techniques perform very badly. Imagine a platform game where a player must retrieve a key from one side of a room to open a door on the other side. There will rarely be a passage of play where this occurs, and even then, the chance of learning that it was the key that lead to the extra reward from the door is miniscule.
It cannot formulate a strategy ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more.