October 2019
Intermediate to advanced
340 pages
8h 39m
English
We have solved the Blackjack problem using off-policy MC control with weighted importance sampling in this recipe. It is quite similar to ordinary importance sampling, but instead of scaling the returns by the ratios and averaging the results, it scales the returns using the weighted average. And, in practice, weighted importance sampling is of much lower variance than ordinary importance sampling and is therefore strongly preferred.
Read now
Unlock full access