Index
𝜖-greedy, 917–921 2 × 2 cube model, 1121 3 × 3 cube model, 1124
A2C baseline, 813 implementation, 814, 815 results, 816–819 video recording, 820 A2C method, 770, 771 implementing, 772, 773, 775–777 models, using, 780, 782 results, 778, 779 videos, recording, 781, 783 A2C on Pong, 605–611 A3C, with data parallelism, 622 results, 623 A3C, with gradient parallelism, 624 implementation, 625–630 results, 631 ACKTR, 837 implementation, 838 results, 839, 840 action selector, 330, 332, 333 actions, 67 continuous action, 71 discrete actions, 70 actor-critic parallelization data parallelism, 619 gradients parallelism, 620, 621 advantage actor-critic (A2C), ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access