N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_4

4. Model-Free Approaches

Nimish Sanghi¹

(1)

Bangalore, India

The previous chapter looked at dynamic programming, used when you know the model dynamics p(s^′, r| s, a), and the knowledge is used to “plan” the optimal actions. This is also known as the planning problem. This chapter shifts focus and looks at learning problems—that is, setups where the model dynamics (aka transition dynamics) are not known. You will learn to calculate/learn state values and state-action values by sampling—that is, collecting experience by following some policy in the real world or running ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models by Nimish Sanghi

4. Model-Free Approaches

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly