O'Reilly logo

Markov Decision Processes in Artificial Intelligence by Olivier Buffet, Olivier Sigaud

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6

Online Resolution Techniques 1

6.1. Introduction

We have seen in previous chapters how to approximately solve large MDPs using various techniques based on a parameterized or structured representation of policies and/or value functions, and on the use of simulation for reinforcement learning (RL) techniques. The optimization process returns an approximate optimal policy images that is valid for the whole state space. For very large MDPs, obtaining a good approximation is often difficult, and is all the more difficult that, in general, we do not know how to precisely quantify the policy’s sub-optimality a priori. A possible improvement then consists of considering these optimization methods as an offline pre-computation. During a second phase, online, the a priori policy is improved by a non-elementary computation for each encountered state.

6.1.1. Exploiting time online

In the framework of MDPs, the algorithm used to determine the current action online is generally very simple. For example, when images is defined through a value function images, this algorithm is a simple comparison of the actions’ values “at one step” (see Algorithm 1.3 in Chapter 1). Similarly, in the case of a parameterized ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required