O'Reilly logo

Markov Decision Processes in Artificial Intelligence by Olivier Buffet, Olivier Sigaud

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5

Policy-Gradient Algorithms 1

Most MDP solving methods evaluate a value function which makes it possible to identify optimal actions in each state. A first difficulty for these approaches is to handle large size problems. As shown in Chapter 3, a solution is to use methods approximating the value function. But the monotonous policy improvement is not guaranteed anymore for algorithms like value iteration or policy iteration. A second difficulty is to handle partial observations, a setting where the Markov property is not necessarily verified.

A very different approach is to perform a direct policy search, the policy taking the form of a parameterized controller. The choice of this controller makes it possible to adapt to the type of problem being considered and to make a compromise between expressing a large variety of policies and reducing the memory usage. We end up with a parameter optimization problem for which algorithms exist that guarantee a monotonous improvement of the policy towards a local optimum, even - in some cases - under partial observability.

EXAMPLE 5.1. What about the cars that need maintenance then?1 A garage mechanic often knows the procedures (or algorithms) he has to follow to diagnose and repair a vehicle. But these procedures may vary depending on the model or the age of the vehicle. Thus, it can be more cost-effective to change the brake discs without testing them if their age is beyond an age limit. The methods we will discuss here aim at optimizing/learning ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required