11Reinforcement Learning

Reinforcement learning is an area of machine learning concerned with how agents should take actions in an environment in order to maximize the notion of a so‐called cumulative reward. The idea goes back to the work by the Russian Nobel laureate Ivan Pavlov on classical conditioning where he showed that you can train dogs by rewarding or punishing them. To formalize this, we consider the dog to be in different states at different stages in time. The value of the state depends on the current state and the action we take during our training. Each and every state is also related to a reward . It is then desirable to maximize the sum, possibly discounted, of rewards over some set of stages, say , where could be infinity. In reinforcement learning, the action is ...

Get Optimization for Learning and Control now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Optimization for Learning and Control by Anders Hansson, Martin Andersen

11Reinforcement Learning

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly