October 2019
Intermediate to advanced
340 pages
8h 39m
English
We can actually implement the MC method in an incremental way. In an episode, instead of storing the return and importance ratio for each first-occurring, state-action pair, we can calculate the Q-function on the fly. In a non-incremental way, the Q-function is computed in the end with all stored returns in n episodes:

While in the incremental approach, the Q-function is updated in each step of an episode as follows:

The incremental equivalent is more efficient as it reduces memory consumption and is more scalable. Let's go ...
Read now
Unlock full access