Chapter 7

Partially Observable Markov Decision Processes 1

Although most real-life systems can be modeled as Markov processes, it is often the case that the agent trying to control or to learn to control these systems has not enough information to infer the real state of the process. The agent observes the process but does not know its state. The framework of Partially Observable Markov Decision Processes (POMDPs) has been especially designed to deal with this kind of situation where an agent only has a partial knowledge of the process to control.

EXAMPLE 7.1. The car maintenance example developed in previous chapters (see sections 1.1 and 2.1) implicitly relied on the fact that the state of the car was known. More than often, this is not the case as no one is constantly checking the waterproofness of the cylinder head gasket or the wear-out of the brake lining. A quick look over does not give us the exact state of the car but only an observation with only partial and unreliable information. The framework of POMDPs makes it possible to model this kind of problem and the solutions obtained using this framework show how to choose optimal decisions despite partial information. But, once again, the dynamics of the process must be known, that is to say the consequences of the possible actions (transitions), the cost of actions (rewards) and the probabilities of each observation when in a given state (observation function).

Difficulties raised by the application of dynamic programming ...

Get Markov Decision Processes in Artificial Intelligence now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.