A known model
When a model is known, it can be used to simulate complete trajectories and compute the return for each of them. Then, the actions that yield the highest reward are chosen. This process is called planning, and the model of the environment is indispensable as it provides the information required to produce the next state (given a state and an action) and reward.
Planning algorithms are used everywhere, but the ones we are interested in differ from the type of action space on which they operate. Some of them work with discrete actions, others with continuous actions.
Planning algorithms for discrete actions are usually search algorithms that build a decision tree, such as the one illustrated in the following diagram:
The current ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access