In Chapter 2, we talked about the parts of the setup that form the agent and the part that forms the environment. The agent gets the state St = s and learns a policy π(s| a) that maps states to actions. The agent uses this policy to take an action At = a when in state St = s. The system transitions to the next time instant of t + 1. The environment responds to the action (At = a) by putting the agent in a new state of St + 1 = s’ and providing feedback to the agent in terms of a reward, Rt + 1. The agent has no control over what the new state St + 1 and reward Rt + 1 will be. ...
3. Model-Based Algorithms
Get Deep Reinforcement Learning with Python: With PyTorch, TensorFlow and OpenAI Gym now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.