Modeling and solving a sequential decision problem with an MDP require some strong assumptions: mono-criterion preference representation, complete and precise knowledge of the environment at each step, knowledge of the model itself, well-defined probabilistic uncertainty representation, etc.
Among those hypotheses, we have seen that some could be relaxed. POMDPs allow for taking into account a partial knowledge of the environment. Reinforcement learning methods make it possible to go without the knowledge of the model itself. In this chapter, we are more particularly interested in the two other limitations, those of a mono-criterion preference representation and of a well-defined probabilistic uncertainty representation.
More specifically, we start by describing the formalism of multicriteria or vector-valued MDPs [FUR 80, WHI 82, WAK 01] extending the MDP framework to multicriteria decision-making. In this framework, we present an algorithm [MOU 04] allowing the heuristic computation of satisfying policies that try to be close to an ideal solution.
Then we describe a first approach for solving an MDP whose model is ill-known. In this approach, called robust, [GIV 00, BAG 01, NIL 04, NIL 05] we stay in the standard probabilistic framework. However, we do not assume that the transition function and the reward function are perfectly known. Thanks to certain assumptions (knowledge of an interval for the probabilities and the rewards), ...