Utility of sequences

The utility of sequences refers to the overall reward received when the agent goes through the sequences of states. It is represented as  where  represents the sequence of states.

The second assumption is that if there are two utilities,  and , such that the start state for both the sequences are the same and,

then, 

This means, ...

Get Reinforcement Learning with TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.