Machine learning
Thanks to the flexibility of RL, it can be employed not only in standalone tasks but also as a sort of fine-tune method in supervised learning algorithms. In many natural language processing (NLP) and computer vision tasks, the metric to optimize isn't differentiable, so to address the problem in supervised settings with neural networks, it needs an auxiliary differentiable loss function. However, the discrepancy between the two loss functions will penalize the final performance. One way to deal with this is to first train the system using supervised learning with the auxiliary loss function, and then use RL to fine-tune the network optimizing with respect to the final metric. For instance, this process can be of benefit ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access