October 2019
Intermediate to advanced
340 pages
8h 39m
English
In this recipe, we solve the Blackjack game with on-policy MC control by exploring starts. This accomplishes our policy optimization goal by alternating between evaluation and improvement with each episode we simulate.
In Step 2, we run an episode and take actions under a Q-function by performing the following tasks:
.It is important to note that the first ...
Read now
Unlock full access