© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2024
N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_4

4. Model-Free Approaches

Nimish Sanghi1  
(1)
Bangalore, India
 

The previous chapter looked at dynamic programming, used when you know the model dynamics p(s, rs, a), and the knowledge is used to “plan” the optimal actions. This is also known as the planning problem. This chapter shifts focus and looks at learning problems—that is, setups where the model dynamics (aka transition dynamics) are not known. You will learn to calculate/learn state values and state-action values by sampling—that is, collecting experience by following some policy in the real world or running ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.