N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/978-1-4842-6809-4_3

3. Model-Based Algorithms

Nimish Sanghi¹

(1)

Bangalore, India

In Chapter 2, we talked about the parts of the setup that form the agent and the part that forms the environment. The agent gets the state S_t = s and learns a policy π(s| a) that maps states to actions. The agent uses this policy to take an action A_t = a when in state S_t = s. The system transitions to the next time instant of t + 1. The environment responds to the action (A_t = a) by putting the agent in a new state of S_t + 1 = s^’ and providing feedback to the agent in terms of a reward, R_t + 1. The agent has no control over what the new state S_t + 1 and reward R_t + 1 will be. ...

Get Deep Reinforcement Learning with Python: With PyTorch, TensorFlow and OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning with Python: With PyTorch, TensorFlow and OpenAI Gym by Nimish Sanghi

3. Model-Based Algorithms

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly