April 2018
Intermediate to advanced
334 pages
10h 18m
English
In Chapter 3, Markov Decision Process, we discussed the transition model of the environment, which follows the Markov property, and the concept of delayed rewards and value (or utility) functions. Well, in this chapter we take a look at the Markov decision process, learn about Q-learning, and a modified approach called the deep Q-network for generalizing in different environments.
We will cover the following topics in this chapter: