What this book covers
Chapter 1, Understanding Rewards-Based Learning, explores the basics of learning, what it is to learn, and how RL differs from other, more classic learning methods. From there, we explore how the Markov decision process works in code and how it relates to learning. This leads us to the classic multi-armed and contextual bandit problems. Finally, we will learn about Q-learning and quality-based model learning.
Chapter 2, Dynamic Programming and the Bellman Equation, digs deeper into dynamic programming and explores how the Bellman equation can be intertwined into RL. Here, you will learn how the Bellman equation is used to update a policy. We then go further into detail about policy iteration or value iteration methods ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access