October 2019
Intermediate to advanced
366 pages
12h 4m
English
Besides the big problems of sample-efficiency and generalization, when dealing with the real world, we need to face problems such as safety and domain constraints. In fact, the agent is often not free to interact with the world due to safety and cost constraints. A solution may come from the use of constraint algorithms such as TRPO and PPO, which are embedded into the system mechanisms to limit the change of actions while training. This could prevent the agent from a drastic change in its behavior. Unfortunately, in highly sensitive domains, this is not enough. For example, nowadays, you cannot start training a self-driving car on the road straight away. The policy may take hundreds or thousands of cycles to ...
Read now
Unlock full access