In the previous chapters, we considered a planning setting in which the algorithm is given a model of the MODP and produces a coverage set. But what if the model is unknown at the time a coverage set needs to be produced? In that case, the algorithm must learn about the MODP from interaction with the environment. This is called the reinforcement learning or simply learning setting [Sutton and Barto, 1998, Wiering and Van Otterlo, 2012].
Multi-objective reinforcement learning (MORL) applies to all three of the use cases we described in the introduction (Figure 1.1): unknown weights, decision support, and known weights. However, in MORL, there is another important distinction, between offline and online learning.
In offline learning, ...